Are there any (hopefully simple) utilities to brute-force strip Javascript from an HTML file - or better yet, a (local) directory with more than one HTML file? I vaguely recall seeing a few such programs a long, long time ago... back in the early 90s, I think.
But all I can find now are mostly scripts, one shareware webadmin utility that I'm not sure would even do exactly what I want, here, and a single freeware program that - unfortunately - also strips out all HTML code as well as Javascript. I'm looking to keep the HTML as it is, because most HTML-to-text utilities or anything else that strips HTML code seem to lose the formatting as well, most of the time.
+ Reply to Thread
Results 1 to 13 of 13
-
If cameras add ten pounds, why would people want to eat them?
-
Just delete it yourself?
Of course, it's there for functionality...
It may lose formatting because the stylesheet is mistakenly deleted -
Yeah, I can do that. But it can get rather tedious for a large directory full of HTML pages.
It's mostly for stripping Javascript ad redirect code (the 'Click Here to Continue' redirect ads) or IntelliText ads from archived webpages, as while my HTML viewer's decent enough, it doesn't allow me to switch off Javascript.I'm not worried about stylesheets, really.
If cameras add ten pounds, why would people want to eat them? -
Originally Posted by Ai Haibara
Anyway, a simple way to deactivate is with a text editor.
For instance, I use Ultraedit.
I can do "change in files" and select a folder, and it will do a search-and-replace in every file in that folder.
So you could change
<script
to
<!--
and
/script>
to
-->
which would turn all the scripts into invisible comments.
There are many text-edit utilities that can do similar operations, going back to "sed" the unix stream editor and DOS clones of that. -
I'm sure. I'm using a stand-alone smaller HTML viewer, in this case, and it doesn't have the option. (I archive a lot of text, HTML and RTF files (okay, I'll say it - it's fanfiction
), and I didn't want to load a large browser just to view HTML pages in an archive.)
Darn, I forgot all about search and replace. I've even got a few batch files already set up with an old global-search-and-replace console utility to replace some common SmartQuotes in a file with their ASCII equivalents. I'd still prefer to try removing the Javascript, though. Maybe it'll save some space.If cameras add ten pounds, why would people want to eat them? -
Originally Posted by Ai Haibara
I'm sure there is a perl utility that could parse the scripts out in a more general way. Look at some perl scripting newsgroups or sites and ask there if you want to get into that; I'm not a perl guru.
it's fanfiction
See http://www.w3.org/Tools/html2things.html
And here's a utility that does exactly what you want: http://www.jafsoft.com/detagger/remove-markup.html, but it costs $30. -
Originally Posted by AlanHK
Originally Posted by AlanHKIn searching, I found a number of Perl and PHP scripts that supposedly do it, but that doesn't help me, much. Now, if it was Python, I could experiment with it a little...
Originally Posted by AlanHK
There was one or two that did what I wanted, which was to keep the italics/bold/etc. by converting it to a 7-bit equivalent... but they were among the ones that also lost the formatting. Maybe I should experiment with those, again. If only I had more knowledge about scripts...If cameras add ten pounds, why would people want to eat them? -
I remember using Advanced Replacer (shareware) by PearlFox 2 or 3 years ago for removing any text between HTML tags in multiple pages at once (that's what you want). Looks like it is not supported now (garbage on former home page). But it still can be googled and downloaded (can't remember whether trial works). Description here:
http://www.freedownloadscenter.com/Utilities/Text_Search_and_Replace_Tools/Advanced_Replacer.html
With the script %anything% you can easily remove banners from your pages. -
Originally Posted by Ai Haibara
tags and convert them to something like {B}, {I}, {P}.
Then run a HTML strip; then convert the {} back to <>. But if they used funky <font..> tags or styles, as likely if any MS app was used to generate them, you'd lose that. -
curious to know what HTML "viewer" you're using
doesn't sound very popular... -
Well, it's slightly obscure... I think. It's the "Universal Viewer"/ATViewer (http://www.uvviewsoft.com/ ), which I believe was primarily written for use with the Total Commander shell, IIRC. It's a decent webpage/RTF/other viewer (well, certainly MUCH better than the ones I had been using). I haven't tried using it for text, though, since I'd already been using WnBrowse for several years.
(I only use UV/ATV as a simple single-screen viewer, and not as a file-system browser... and was using it before they brought out a separate 'pay' version, so I've only needed the 'free' version.)
I suppose it could have some deeply buried method to turning off Javascript (it uses the MSIE engine for HTML, though IIRC, you can get a plugin that uses the Gecko engine, instead. Maybe I ought to see if I could turn off Javascript in that...)
Originally Posted by AlanHKIf cameras add ten pounds, why would people want to eat them? -
Originally Posted by Ai Haibara
(I like the function in Dreamweaver: "Fix Word HTML".)
You might also look at HTML Tidy http://www.w3.org/People/Raggett/tidy/
Running this first should simplify and clean up the code considerably.
Some options:
-clean, -c replace FONT, NOBR and CENTER tags by CSS (clean: yes)
-raw output values above 127 without conversion to entities
drop-font-tags discard <FONT> and <CENTER> tags
hide-comments * (perhaps this with my previous suggestion of converting script to comment tags would get rid of them completely) -
Oh, I already have a number of utilities to convert those, with no problem... even a version of Tidy, somewhere. I'm not worried about that; it was more of a side comment as to a few of the things Microsoft apps do to HTML files.
(I don't remember if FrontPage also does that, though... never really tried using it beyond a couple of brief experiments with the stripped-down version they once 'included' with IE.)If cameras add ten pounds, why would people want to eat them?
Similar Threads
-
Why Does Windows Movie Maker Strip Time & Date Stamp? Doing MiniDV to PC or
By 2therock in forum Video ConversionReplies: 6Last Post: 12th Jul 2010, 20:21 -
Adobe Javascript 10 question
By jyeh74 in forum Newbie / General discussionsReplies: 3Last Post: 26th Oct 2009, 00:51 -
Javascript/Silverlight, embedding multiple players
By thecoalman in forum ProgrammingReplies: 0Last Post: 4th Mar 2008, 07:00 -
Help - JavaScript Help from anyone?
By TaoTeWingChun in forum ProgrammingReplies: 7Last Post: 14th Dec 2007, 16:34 -
I need urgent help, please! Javascript not working
By rem_2007 in forum ComputerReplies: 4Last Post: 23rd Nov 2007, 09:23