VideoHelp Forum




+ Reply to Thread
Results 1 to 13 of 13
  1. VH Wanderer Ai Haibara's Avatar
    Join Date
    Jan 2006
    Location
    Somewhere on VideoHelp...
    Search Comp PM
    It's probably well-known (well, one hopes) that outputting a document to HTML in Microsoft Word adds a ton of additional code to the document, for no sane reason. I've been having to work with HTML pages from a handful of people, recently, and they use Word to create them, so... is there any sort of program/converter that'll take a Word-created HTML file and convert it to a more sensible HTML file, without all of Word's extra formatting in the way?

    I saw a program in passing, years ago, that claimed to do something like that - but I didn't have need for it, then, and don't know how well it worked... or certainly, have the bookmark now.

    I'm just hopeful there's something out there that can I can just feed the documents to and have it convert, so I don't have to try to filter things out while I'm editing the files. (I can probably find some editor, somewhere, that'll reprocess it before it saves, but...)
    If cameras add ten pounds, why would people want to eat them?
    Quote Quote  
  2. Member
    Join Date
    Sep 2007
    Location
    Europe
    Search PM
    I think this is all about it...

    http://www.codinghorror.com/blog/archives/000485.html

    Thread may have some links to something useful.
    Quote Quote  
  3. Always Watching guns1inger's Avatar
    Join Date
    Apr 2004
    Location
    Miskatonic U
    Search Comp PM
    I don't know of anything free to clean up Word HTML, but Dreamweaver certainly had a specific import function to do it. Also cleaned up Frontpage code as well.
    Read my blog here.
    Quote Quote  
  4. Video Restorer lordsmurf's Avatar
    Join Date
    Jun 2003
    Location
    dFAQ.us/lordsmurf
    Search Comp PM
    You're more or less screwed, from what I know. Manual clean-up.
    Want my help? Ask here! (not via PM!)
    FAQs: Best Blank DiscsBest TBCsBest VCRs for captureRestore VHS
    Quote Quote  
  5. I've had good results in the past using HTML Tidy. The "bare" and "clean" options are designed to:

    This option specifies if Tidy should strip out surplus presentational tags and attributes replacing them by style rules and structural markup as appropriate. It works well on the HTML saved by Microsoft Office products.
    http://tidy.sourceforge.net/docs/quickref.html

    The last time I had to do this was in the days of Office 2000, so your results may vary.

    -drjtech
    They that give up essential liberty to obtain a little temporary safety deserve neither liberty or safety.
    --Benjamin Franklin
    Quote Quote  
  6. Member AlanHK's Avatar
    Join Date
    Apr 2006
    Location
    Hong Kong
    Search Comp PM
    Originally Posted by Ai Haibara
    I saw a program in passing, years ago, that claimed to do something like that - but I didn't have need for it, then, and don't know how well it worked... or certainly, have the bookmark now.
    Probably Dreamweaver's "Fix Microsoft HTML".

    I just love the way MS HTML has pages and pages of style definitions, then when you get to the actual text, it ignores all those and just has lines of nested FONT codes and other braindead stuff.
    Quote Quote  
  7. Member thecoalman's Avatar
    Join Date
    Feb 2004
    Location
    Pennsylvania
    Search PM
    If yoiu look around there's some php scripts for doing this.
    Quote Quote  
  8. VH Wanderer Ai Haibara's Avatar
    Join Date
    Jan 2006
    Location
    Somewhere on VideoHelp...
    Search Comp PM
    Originally Posted by AlanHK
    Originally Posted by Ai Haibara
    I saw a program in passing, years ago, that claimed to do something like that - but I didn't have need for it, then, and don't know how well it worked... or certainly, have the bookmark now.
    Probably Dreamweaver's "Fix Microsoft HTML".

    I just love the way MS HTML has pages and pages of style definitions, then when you get to the actual text, it ignores all those and just has lines of nested FONT codes and other braindead stuff.
    I remember that it was some sort of standalone program. I just didn't have a need for it at the time, and so, I didn't bother trying it. I did save the site in my bookmarks, though... turned on an old system to dig through the bookmarks, and an older version of this may have been it: http://www.bersoft.com/bwhcu/
    Of course, the current version's page also mentions using Dreamweaver to clean up the HTML on the bottom. The reference seems a little dated, though.

    I wouldn't be surprised if Word probably even converts every single thing to HTML entities far more than any other program I've used. A single line of text can become a monster paragraph!

    Originally Posted by thecoalman
    If yoiu look around there's some php scripts for doing this.
    Yeah, I think I've even seen some perl scripts that begin to approach it, too. But the work's actually not crucial, so I'm just seeing if there's some simple executable I can just throw a handful of the Word HTML files at, and see if it generates something that's less of a headache to edit. If it was crucial/important work, I'd most likely open each file in my editor and manually edit everything (as lordsmurf mentions), just to be sure. Either that, or tell everyone I won't accept any HTML output from Word (and wait for the pitchfork-and-torch-bearing mob to form outside my door).

    I think I'll try the above utility and some of the standalone options mentioned in the link Chris K posted, and see what they make of one of the files. drjtech - I'll try HTML Tidy as well, though that codinghorror blog entry doesn't seem to think it'll do the trick, as much.

    Thanks, everyone.
    If cameras add ten pounds, why would people want to eat them?
    Quote Quote  
  9. Video Restorer lordsmurf's Avatar
    Join Date
    Jun 2003
    Location
    dFAQ.us/lordsmurf
    Search Comp PM
    I tell people all the time that I won't accept HTML from Word or PageMaker. That's really just tough shit on them. Those are not web creation applications.
    Want my help? Ask here! (not via PM!)
    FAQs: Best Blank DiscsBest TBCsBest VCRs for captureRestore VHS
    Quote Quote  
  10. Member
    Join Date
    Oct 2004
    Location
    United States
    Search PM
    tell them to send you a text file next time ..it will be easier than trying to wade through the mess! :P
    Quote Quote  
  11. VH Wanderer Ai Haibara's Avatar
    Join Date
    Jan 2006
    Location
    Somewhere on VideoHelp...
    Search Comp PM
    Originally Posted by lordsmurf
    I tell people all the time that I won't accept HTML from Word or PageMaker. That's really just tough shit on them. Those are not web creation applications.
    Hmm... well, I'll think about it. But with my luck, they'll just switch to OpenOffice's Writer... which probably does about the same thing just to maintain feature parity with Word.

    Originally Posted by greymalkin
    tell them to send you a text file next time ..it will be easier than trying to wade through the mess! :P
    Sure, make me reconstruct all the formatting.
    If cameras add ten pounds, why would people want to eat them?
    Quote Quote  
  12. Member AlanHK's Avatar
    Join Date
    Apr 2006
    Location
    Hong Kong
    Search Comp PM
    Originally Posted by Ai Haibara
    Originally Posted by greymalkin
    tell them to send you a text file next time ..it will be easier than trying to wade through the mess! :P
    Sure, make me reconstruct all the formatting.
    Worth a try to open the MS-HTML file in a browser, or Word, and copy and paste it into a real HTML editor. That should produce better code, preserving formatting.
    (I think formatted text on the clipboard is basically RTF format.)
    Quote Quote  
  13. VH Wanderer Ai Haibara's Avatar
    Join Date
    Jan 2006
    Location
    Somewhere on VideoHelp...
    Search Comp PM
    Hmm... that's a thought, too. I'll keep that in mind.
    If cameras add ten pounds, why would people want to eat them?
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!