It's probably well-known (well, one hopes) that outputting a document to HTML in Microsoft Word adds a ton of additional code to the document, for no sane reason. I've been having to work with HTML pages from a handful of people, recently, and they use Word to create them, so... is there any sort of program/converter that'll take a Word-created HTML file and convert it to a more sensible HTML file, without all of Word's extra formatting in the way?
I saw a program in passing, years ago, that claimed to do something like that - but I didn't have need for it, then, and don't know how well it worked... or certainly, have the bookmark now.
I'm just hopeful there's something out there that can I can just feed the documents to and have it convert, so I don't have to try to filter things out while I'm editing the files. (I can probably find some editor, somewhere, that'll reprocess it before it saves, but...)
+ Reply to Thread
Results 1 to 13 of 13
-
If cameras add ten pounds, why would people want to eat them?
-
I think this is all about it...
http://www.codinghorror.com/blog/archives/000485.html
Thread may have some links to something useful. -
You're more or less screwed, from what I know. Manual clean-up.
Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS -
I've had good results in the past using HTML Tidy. The "bare" and "clean" options are designed to:
This option specifies if Tidy should strip out surplus presentational tags and attributes replacing them by style rules and structural markup as appropriate. It works well on the HTML saved by Microsoft Office products.
The last time I had to do this was in the days of Office 2000, so your results may vary.
-drjtechThey that give up essential liberty to obtain a little temporary safety deserve neither liberty or safety.
--Benjamin Franklin -
Originally Posted by Ai Haibara
I just love the way MS HTML has pages and pages of style definitions, then when you get to the actual text, it ignores all those and just has lines of nested FONT codes and other braindead stuff. -
Originally Posted by AlanHK
Of course, the current version's page also mentions using Dreamweaver to clean up the HTML on the bottom.The reference seems a little dated, though.
I wouldn't be surprised if Word probably even converts every single thing to HTML entities far more than any other program I've used. A single line of text can become a monster paragraph!
Originally Posted by thecoalman
I think I'll try the above utility and some of the standalone options mentioned in the link Chris K posted, and see what they make of one of the files. drjtech - I'll try HTML Tidy as well, though that codinghorror blog entry doesn't seem to think it'll do the trick, as much.
Thanks, everyone.If cameras add ten pounds, why would people want to eat them? -
I tell people all the time that I won't accept HTML from Word or PageMaker. That's really just tough shit on them. Those are not web creation applications.
Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS -
tell them to send you a text file next time
..it will be easier than trying to wade through the mess! :P
-
Originally Posted by lordsmurf
But with my luck, they'll just switch to OpenOffice's Writer... which probably does about the same thing just to maintain feature parity with Word.
Originally Posted by greymalkinIf cameras add ten pounds, why would people want to eat them? -
Originally Posted by Ai Haibara
(I think formatted text on the clipboard is basically RTF format.) -
Hmm... that's a thought, too. I'll keep that in mind.
If cameras add ten pounds, why would people want to eat them?
Similar Threads
-
Converting Divx-AVI to MP4 - Output size option doesn't work
By tomzero in forum ffmpegX general discussionReplies: 6Last Post: 20th Mar 2010, 07:30 -
DLC.html
By Hangrumps in forum Video ConversionReplies: 4Last Post: 15th Jul 2009, 15:00 -
html help
By steve42069 in forum ComputerReplies: 4Last Post: 8th Aug 2008, 10:38 -
Converting DV to 3GP --> Output Looks Rather Blocky & "Pixelate
By SnakeGirl in forum ffmpegX general discussionReplies: 8Last Post: 21st Jan 2008, 05:13 -
HTML Help
By FEEL in forum ProgrammingReplies: 3Last Post: 20th Aug 2007, 08:58