Are there any (hopefully simple) utilities to brute-force strip Javascript from an HTML file - or better yet, a (local) directory with more than one HTML file? I vaguely recall seeing a few such programs a long, long time ago... back in the early 90s, I think.
But all I can find now are mostly scripts, one shareware webadmin utility that I'm not sure would even do exactly what I want, here, and a single freeware program that - unfortunately - also strips out all HTML code as well as Javascript. I'm looking to keep the HTML as it is, because most HTML-to-text utilities or anything else that strips HTML code seem to lose the formatting as well, most of the time.
		
			+ Reply to Thread
			
		
		
		
			
	
	
				Results 1 to 13 of 13
			
		- 
	If cameras add ten pounds, why would people want to eat them?
- 
	Just delete it yourself? 
 Of course, it's there for functionality...
 It may lose formatting because the stylesheet is mistakenly deleted
- 
	Yeah, I can do that. But it can get rather tedious for a large directory full of HTML pages.  
 
 It's mostly for stripping Javascript ad redirect code (the 'Click Here to Continue' redirect ads) or IntelliText ads from archived webpages, as while my HTML viewer's decent enough, it doesn't allow me to switch off Javascript. I'm not worried about stylesheets, really.
						If cameras add ten pounds, why would people want to eat them? I'm not worried about stylesheets, really.
						If cameras add ten pounds, why would people want to eat them?
- 
	Are you sure you can't turn off Javascript? It's one click in Opera or Firefox (with Noscript).Originally Posted by Ai Haibara
 
 Anyway, a simple way to deactivate is with a text editor.
 For instance, I use Ultraedit.
 I can do "change in files" and select a folder, and it will do a search-and-replace in every file in that folder.
 
 So you could change
 <script
 to
 <!--
 and
 /script>
 to
 -->
 
 which would turn all the scripts into invisible comments.
 
 There are many text-edit utilities that can do similar operations, going back to "sed" the unix stream editor and DOS clones of that.
- 
	I'm sure. I'm using a stand-alone smaller HTML viewer, in this case, and it doesn't have the option. (I archive a lot of text, HTML and RTF files (okay, I'll say it - it's fanfiction  ), and I didn't want to load a large browser just to view HTML pages in an archive.) ), and I didn't want to load a large browser just to view HTML pages in an archive.)
 
 Darn, I forgot all about search and replace. I've even got a few batch files already set up with an old global-search-and-replace console utility to replace some common SmartQuotes in a file with their ASCII equivalents. I'd still prefer to try removing the Javascript, though. Maybe it'll save some space. If cameras add ten pounds, why would people want to eat them? If cameras add ten pounds, why would people want to eat them?
- 
	If they're all from the same site, every page will have the same scripts. So you can s&r for the whole script and just delete.Originally Posted by Ai Haibara
 
 I'm sure there is a perl utility that could parse the scripts out in a more general way. Look at some perl scripting newsgroups or sites and ask there if you want to get into that; I'm not a perl guru.
 
 
 Perhaps then a simpler method: if it's just paragraphs of text with no formatting, you can copy-and-paste to a text file from a browser. There are utilities that do that as a command line (lynx, I think, has an option).it's fanfiction
 See http://www.w3.org/Tools/html2things.html
 
 And here's a utility that does exactly what you want: http://www.jafsoft.com/detagger/remove-markup.html, but it costs $30.
- 
	They're not all from the same site. However, if I were just to do a global search and replace on all files using the tags you mentioned above, that would probably work, of course.Originally Posted by AlanHK 
 
 Neither am I.Originally Posted by AlanHK In searching, I found a number of Perl and PHP scripts that supposedly do it, but that doesn't help me, much.  Now, if it was Python, I could experiment with it a little... In searching, I found a number of Perl and PHP scripts that supposedly do it, but that doesn't help me, much.  Now, if it was Python, I could experiment with it a little...
 
 Most of it does have both formatting and styles, which I do want to keep. I've been experimenting with HTML-to-text converters for a while, and many of the ones I've tried end up removing both the formatting and styles (I'm guessing that's most likely because the original pages probably did all their formatting with tags). Some even crashed on the Javascript.Originally Posted by AlanHK 
 There was one or two that did what I wanted, which was to keep the italics/bold/etc. by converting it to a 7-bit equivalent... but they were among the ones that also lost the formatting. Maybe I should experiment with those, again. If only I had more knowledge about scripts... If cameras add ten pounds, why would people want to eat them? If cameras add ten pounds, why would people want to eat them?
- 
	I remember using Advanced Replacer (shareware) by PearlFox 2 or 3 years ago for removing any text between HTML tags in multiple pages at once (that's what you want). Looks like it is not supported now (garbage on former home page). But it still can be googled and downloaded (can't remember whether trial works). Description here: 
 
 http://www.freedownloadscenter.com/Utilities/Text_Search_and_Replace_Tools/Advanced_Replacer.html
 
 With the script %anything% you can easily remove banners from your pages.
- 
	Well, you could do a S&R for [B] and [I] andOriginally Posted by Ai Haibara
 
 tags and convert them to something like {B}, {I}, {P}.
 Then run a HTML strip; then convert the {} back to <>. But if they used funky <font..> tags or styles, as likely if any MS app was used to generate them, you'd lose that.
- 
	curious to know what HTML "viewer" you're using 
 doesn't sound very popular...
- 
	Well, it's slightly obscure... I think. It's the "Universal Viewer"/ATViewer (http://www.uvviewsoft.com/ ), which I believe was primarily written for use with the Total Commander shell, IIRC. It's a decent webpage/RTF/other viewer (well, certainly MUCH better than the ones I had been using). I haven't tried using it for text, though, since I'd already been using WnBrowse for several years. 
 (I only use UV/ATV as a simple single-screen viewer, and not as a file-system browser... and was using it before they brought out a separate 'pay' version, so I've only needed the 'free' version.)
 
 I suppose it could have some deeply buried method to turning off Javascript (it uses the MSIE engine for HTML, though IIRC, you can get a plugin that uses the Gecko engine, instead. Maybe I ought to see if I could turn off Javascript in that...)
 
 Not to mention the additional tag soup Word adds. :/ Doesn't Word also convert a lot of things to entities, too? (the ; -type character representations, for anyone else reading this)Originally Posted by AlanHKIf cameras add ten pounds, why would people want to eat them?
- 
	I would expect an HTML-to-ASCII converter would handle these correctly.Originally Posted by Ai Haibara
 
 (I like the function in Dreamweaver: "Fix Word HTML".)
 
 You might also look at HTML Tidy http://www.w3.org/People/Raggett/tidy/
 Running this first should simplify and clean up the code considerably.
 Some options:
 -clean, -c replace FONT, NOBR and CENTER tags by CSS (clean: yes)
 -raw output values above 127 without conversion to entities
 drop-font-tags discard <FONT> and <CENTER> tags
 hide-comments * (perhaps this with my previous suggestion of converting script to comment tags would get rid of them completely)
- 
	Oh, I already have a number of utilities to convert those, with no problem... even a version of Tidy, somewhere. I'm not worried about that; it was more of a side comment as to a few of the things Microsoft apps do to HTML files.  
 
 (I don't remember if FrontPage also does that, though... never really tried using it beyond a couple of brief experiments with the stripped-down version they once 'included' with IE.)If cameras add ten pounds, why would people want to eat them?
Similar Threads
- 
  Why Does Windows Movie Maker Strip Time & Date Stamp? Doing MiniDV to PC orBy 2therock in forum Video ConversionReplies: 6Last Post: 12th Jul 2010, 21:21
- 
  Adobe Javascript 10 questionBy jyeh74 in forum Newbie / General discussionsReplies: 3Last Post: 26th Oct 2009, 01:51
- 
  Javascript/Silverlight, embedding multiple playersBy thecoalman in forum ProgrammingReplies: 0Last Post: 4th Mar 2008, 08:00
- 
  Help - JavaScript Help from anyone?By TaoTeWingChun in forum ProgrammingReplies: 7Last Post: 14th Dec 2007, 17:34
- 
  I need urgent help, please! Javascript not workingBy rem_2007 in forum ComputerReplies: 4Last Post: 23rd Nov 2007, 10:23


 
		
		 View Profile
				View Profile
			 View Forum Posts
				View Forum Posts
			 Private Message
				Private Message
			 
 
			
			 
			

 Quote
 Quote 
			 
			