Anybody know of a program which will quickly strip formatting codes from an srt file?
(ie, the <b> </i> etc)
+ Reply to Thread
Results 1 to 30 of 31
-
-
If that doesn't work you can try using find/replace all functions in any text editor (e.g. word, even notepad)
-
-
I have many, and I don't know what might be in the file. So I have to look at every line to see what kinds of codes there might be. and then I might have to sit and do replace all 50 times for different codes.
Too time consuming.
Hm, perhaps I'll have to write something myself. -
I've seen utilities before that were designed to automatically strip HTML code from a file (usually text files), though I couldn't tell you what they were (as I haven't looked for them before, since seeing at least one in passing some years ago, and don't know whether or not they would be able to do batch work). The term to search for would probably be 'HTML Strip'.
On the other hand, if you can find an editor that'll allow you to batch-process files with a macro (probably a regular-expression function that removes the <> brackets and anything between them, at the least), that might work.If cameras add ten pounds, why would people want to eat them? -
-
Hmmmm, you might try a pattern substitution program.
Search for a perl or awk or sed script to filter out the formatting characters.
Then you could run it in batch.
Try searching "sed one-liners"
"awk one-liners" etc..
If you are on Windows you can download free Windows versions of the
programs. The thing is to find the right script or command line to do the
filtering. The rest is relatively easy.Last edited by MilesAhead; 30th Sep 2010 at 18:23.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
Try using the spell and grammer check in your favorite word processor along with the global replace for <i>, </i>, <b>, </b>, etc. When you come across an odd character, add it it your global search and replace list.
That said, I just posted in another thread about taking a little pride in what you do and this is another prime example.
It seems too often that the guy with the fastest and most subtitle uploads is often the guy with the worst subtitles (poorly formatted, misspellings and grammatical errors).
Personally, I'd rather wait for a proper subtitle that will enhance rather than detract from the movie.
-
Open the *.srt in Subtitle Workshop, select all subtitles (<CTRL> + A), right-click and you will get a pop-up menu which should be able to help you.
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -
If the only items you need to remove are tags that have less than '<' for tag start, greater than '>' for tag end, and stuff between, it should be simple to do with a sed or awk script. A one liner should do it. The only thing that may mess it up is if there's actually a '<' or '>' in the dialog.. which seems unlikely.
If you have a slew of them to do streaming edit is the way to go.
I haven't used Linux tools in quite awhile so don't ask me for the filter line. But searching one liners should show you how it works. Way better than loading 2000 files individually into a program.
btw, if you are on Windows there are free Linux tools that will run in a Windows command prompt... such as sed and awk and many others. Take a look here:
http://gnuwin32.sourceforge.net/packages.htmlLast edited by MilesAhead; 2nd Oct 2010 at 12:04.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
awk ain't so bad. I never much liked perl substitutions.
Seems the easiest way is get a scripting language that lets your read a char at a time from the files.
You have a boolean variable. InEscSeq = FALSE.
You read a char from the input, see what it is, if InEscSeq is not TRUE, copy it to the output unless it's '<' then make InEscSeq TRUE. While InEscSeq, read a char, if it's not '>' throw it away. If it is '>', now InEscSeq is FALSE, read chars and copy to output until InEscSeq is TRUE again.
It's pretty elementary. AutoIt3 may be easiest or vbscript.
edit: AutoIt3 is free and easy to learn. If you go here and tell them what you are trying to do, somebody will likely point you in the right direction:
http://www.autoitscript.com/forum/index.php?showforum=2
It's a great language to pick up for Windows even if you've never done any programming. Very simple syntax.Last edited by MilesAhead; 2nd Oct 2010 at 18:01.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
BTW
In Subtitles Modifier can use Remove comments option.
VideoAudio.pl - Serwis o technologii wideo & audio -
-
There is a > in every SRT timecode:
1
00:00:22,694 --> 00:00:26,779
HUMANS, MUTANTS OF NEW YORK AND
ELSEWHERE, SAY NO TO SYNTHETIC FLESH
So your script needs to be a bit smarter.
Usually the only codes are italics and bold, so just deleting all <i> <b> </b> and </i> should be sufficient. Do a global search for "<" to see if there are any others.
But I prefer to do it semi-manually.
I first delete all <i> at the beginning of a line, then review the remaining ones. Some of these should be converted to quote marks (eg, name of a magazine). Then delete the rest and any trailing </i>.Last edited by AlanHK; 15th Nov 2014 at 20:23.
-
Not really. If you read the pseudo code, when '<' is encountered, escape sequence state becomes TRUE. Only while it's TRUE will encountering '>' return it to FALSE. The idea was not to produce an optimized program, but one a non-programmer could grasp right away. That's why if you look at many programming books they start out with file handling examples like copying a file such as
while(c = readchar(infile))
writechar(outfile,c)
Nobody who has programmed longer than a day actually writes code like that. It's to get the idea across.
edit: in any case the ideal way would be streaming edit so that the files can be handled in batch or using globbing. Substitution expressions aren't my forte. That's why I didn't stick with Perl. But someone good with Linux stream tools like sed or awk or perl could probably knock this out with a one-liner. I'm not going to kill myself to do someone else's research. If you read and search you can find it if not write it yourself.Last edited by MilesAhead; 12th Oct 2010 at 22:09.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
Well, I didn't work through it, but I'll take your word for it.
I could probably read real code easier than an explanation.
In any case, there is rarely anything other than simple italic and occasionally bold, anything else and you might want to check it out rather than deleting it sight unseen. -
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
If you know what you are doing you can use an editor like Word, WordPad, etc. to remove these codes pretty easily. That's how I do it. It's a simple find and replace edit where you replace with empty space.
-
This sed one-liner is worth a try. It's supposed to filter html tags, bit it looks like it will remove
anything starting with '<' and ending with '>"
# remove most HTML tags (accommodates multiple-line tags)
sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
I'm not going to install sed to try it. The other thing is, people add whatever they want to try adding on to srt subs. So in addition to tags you may have placement escape sequences. Like a backslash with a number. So they can be hacked with just about anything. The person who needs to filter the extraneous stuff needs to do the trial and error to tune the filter for his particular input set. There's no way to know for sure what will be in the lines.
Here's a bunch of sed ine liners:
http://sed.sourceforge.net/sed1line.txt
You can get free sed for Windows.
You can find tutorials.
My hand holding is done.
Good luck.
edit: ok, I have to concede due to the differences with Windows command line it's not as easy as it should be. If you have use for the Linux utilities then it's worth it to install a shell that can handle the scripts with the Linux syntax. But for one application it's a bit much to plow through.
I have a better solution. As others have noted, Regular Expressions should be part of the answer. But we want a command line driven program so the OP can process in batch. AutoHotKey has regex support and this forum's moderator is very adept in AHK. Also he likes to whip up custom small tools by request(which is what the forum is about.) They call it Coding Snacks. Small programs to order.
Please read the rules for posting and post exactly what you would like the program to do:
http://www.donationcoder.com/Forums/bb/index.php?board=31.0
It shouldn't take skwire, or another volunteer, long to code it. I'm not great at regex so stuff I'd do with 8 passes they can probably knock out with one substitution string. They don't charge for creating programs but donations are appreciated.Last edited by MilesAhead; 14th Oct 2010 at 18:24.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
While you are waiting for skwire to code you something thorough with more options, this script
should at least get rid of <x> and </x> tags.
For best results get AutoHotKey and compile it to .exe.
Or you can just run the script.
edit: I changed the script so that the "working directory" is whatever directory it was launched from. If the command prompt was at c:\temp then c:\temp is where it will look for files. That way you can just copy it to a folder in your path.
So if this .exe works you should be able to batch convert with a command line like
for %s in (*.srt) do SrtStrip %s
Code:#NoEnv #NoTrayIcon SendMode Input If 0 < 1 { MsgBox, 64, SrtStrip Usage, SrtStrip Copyright (c) 2010 www.FavesSoft.com`n`nUsage: SrtStrip filename`n`nOutput is filename with .strip extension stripped of <> tags ExitApp } InFile := %0% OutFile := InFile . ".strip" Loop, read, %InFile%,%OutFile% { result := RegExReplace(A_LoopReadLine,"<.>") result := RegExReplace(result,"</.>") FileAppend,%result%`n }
Last edited by MilesAhead; 15th Oct 2010 at 12:34.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
Or you can just download the compiled .exe from my page:
http://www.favessoft.com/downloads.html
SrtStrip 1.0
It's free for you to use at your own risk.
See included Readme.txt for caveats.
edit: download the latest version.
The substitution for each line now takes place in a single pass.
Told you I was lousy at RegEx. I have to trial and error it
all the way.Last edited by MilesAhead; 16th Oct 2010 at 11:48.
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
btw here's the new source in case you want to modify it to filter additional tag types or macros:
Code:#NoEnv #NoTrayIcon SendMode Input If 0 < 1 { MsgBox, 64, SrtStrip Usage, SrtStrip Copyright (c) 2010 www.FavesSoft.com`n`nUsage: SrtStrip filename.srt`n`nOutput is subtitle filename.srt.strip stripped of <> tags ExitApp } InFile := %0% OutFile := InFile . ".strip" IfExist,%OutFile% FileDelete,%OutFile% Loop, read, %InFile%,%OutFile% { result := RegExReplace(A_LoopReadLine,"</?.>") FileAppend,%result%`n }
Edit2: I uploaded a new zip. I forgot to include the ahk include file in the zip. Now it should have everything needed to compile if you get AHK_L scripting language(it's free.)Last edited by MilesAhead; 4th Sep 2012 at 22:03. Reason: new version available
http://milesaheadsoftware.org/
Fully enabled freeware for Windows PCs. -
-
Similar Threads
-
removing background noise in AVI files created from .MOV files
By p_s_92 in forum Newbie / General discussionsReplies: 7Last Post: 23rd Jun 2010, 08:26 -
Any program that adds srt files to mkv files
By Peterrrrr in forum SubtitleReplies: 11Last Post: 30th Nov 2008, 03:21 -
2 xvid files 2 srt files
By blot in forum Video ConversionReplies: 4Last Post: 25th Sep 2007, 10:34 -
Putting 2 QT Files w/ 2 seperate .srt files onto a DVD
By whoracle in forum ffmpegX general discussionReplies: 5Last Post: 22nd Jul 2007, 02:57 -
Using Submerge to Join avi Files to srt Files: Problems with Export
By TiggyWink in forum MacReplies: 1Last Post: 26th Jun 2007, 08:16