VideoHelp Forum
+ Reply to Thread
Page 1 of 2
1 2 LastLast
Results 1 to 30 of 31
Thread
  1. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    Anybody know of a program which will quickly strip formatting codes from an srt file?
    (ie, the <b> </i> etc)
    Quote Quote  
  2. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    I'll give it a look.
    Quote Quote  
  3. If that doesn't work you can try using find/replace all functions in any text editor (e.g. word, even notepad)
    Quote Quote  
  4. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    Originally Posted by poisondeathray View Post
    If that doesn't work you can try using find/replace all functions in any text editor (e.g. word, even notepad)
    Too time consuming.
    Quote Quote  
  5. DECEASED
    Join Date
    Jun 2009
    Location
    Heaven
    Search Comp PM
    Originally Posted by branch View Post
    Originally Posted by poisondeathray View Post
    If that doesn't work you can try using find/replace all functions in any text editor (e.g. word, even notepad)
    Too time consuming.


    1) Only if you DON'T USE the "Replace All" feature;

    or

    2) You have some thousands of .SRT files to correct.
    Quote Quote  
  6. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    Originally Posted by Richardm View Post
    Crashes when I start it.
    Quote Quote  
  7. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    Originally Posted by El Heggunte View Post
    Originally Posted by branch View Post
    Originally Posted by poisondeathray View Post
    If that doesn't work you can try using find/replace all functions in any text editor (e.g. word, even notepad)
    Too time consuming.


    1) Only if you DON'T USE the "Replace All" feature;

    or

    2) You have some thousands of .SRT files to correct.
    I have many, and I don't know what might be in the file. So I have to look at every line to see what kinds of codes there might be. and then I might have to sit and do replace all 50 times for different codes.
    Too time consuming.

    Hm, perhaps I'll have to write something myself.
    Quote Quote  
  8. VH Wanderer Ai Haibara's Avatar
    Join Date
    Jan 2006
    Location
    Somewhere on VideoHelp...
    Search Comp PM
    I've seen utilities before that were designed to automatically strip HTML code from a file (usually text files), though I couldn't tell you what they were (as I haven't looked for them before, since seeing at least one in passing some years ago, and don't know whether or not they would be able to do batch work). The term to search for would probably be 'HTML Strip'.

    On the other hand, if you can find an editor that'll allow you to batch-process files with a macro (probably a regular-expression function that removes the <> brackets and anything between them, at the least), that might work.
    If cameras add ten pounds, why would people want to eat them?
    Quote Quote  
  9. DECEASED
    Join Date
    Jun 2009
    Location
    Heaven
    Search Comp PM
    Originally Posted by Ai Haibara View Post
    strip HTML code
    IIRC, even the freeware versions of NoteTab and EditPad are capable of that
    (at least they were, a couple of years ago).
    Quote Quote  
  10. Hmmmm, you might try a pattern substitution program.
    Search for a perl or awk or sed script to filter out the formatting characters.
    Then you could run it in batch.

    Try searching "sed one-liners"
    "awk one-liners" etc..

    If you are on Windows you can download free Windows versions of the
    programs. The thing is to find the right script or command line to do the
    filtering. The rest is relatively easy.
    Last edited by MilesAhead; 30th Sep 2010 at 19:23.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  11. Member
    Join Date
    Jul 2007
    Location
    United States
    Search Comp PM
    Try using the spell and grammer check in your favorite word processor along with the global replace for <i>, </i>, <b>, </b>, etc. When you come across an odd character, add it it your global search and replace list.

    That said, I just posted in another thread about taking a little pride in what you do and this is another prime example.

    It seems too often that the guy with the fastest and most subtitle uploads is often the guy with the worst subtitles (poorly formatted, misspellings and grammatical errors).

    Personally, I'd rather wait for a proper subtitle that will enhance rather than detract from the movie.

    Quote Quote  
  12. The root of all evil träskmannen's Avatar
    Join Date
    May 2005
    Location
    Belgium
    Search Comp PM
    Open the *.srt in Subtitle Workshop, select all subtitles (<CTRL> + A), right-click and you will get a pop-up menu which should be able to help you.
    In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.
    Quote Quote  
  13. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    Originally Posted by träskmannen View Post
    Open the *.srt in Subtitle Workshop, select all subtitles (<CTRL> + A), right-click and you will get a pop-up menu which should be able to help you.
    Still not optimal, but thanks.
    Quote Quote  
  14. If the only items you need to remove are tags that have less than '<' for tag start, greater than '>' for tag end, and stuff between, it should be simple to do with a sed or awk script. A one liner should do it. The only thing that may mess it up is if there's actually a '<' or '>' in the dialog.. which seems unlikely.

    If you have a slew of them to do streaming edit is the way to go.
    I haven't used Linux tools in quite awhile so don't ask me for the filter line. But searching one liners should show you how it works. Way better than loading 2000 files individually into a program.

    btw, if you are on Windows there are free Linux tools that will run in a Windows command prompt... such as sed and awk and many others. Take a look here:

    http://gnuwin32.sourceforge.net/packages.html
    Last edited by MilesAhead; 2nd Oct 2010 at 13:04.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  15. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    I assure you, to me there is nothing simple about sed or awk scripts
    Quote Quote  
  16. awk ain't so bad. I never much liked perl substitutions.

    Seems the easiest way is get a scripting language that lets your read a char at a time from the files.
    You have a boolean variable. InEscSeq = FALSE.

    You read a char from the input, see what it is, if InEscSeq is not TRUE, copy it to the output unless it's '<' then make InEscSeq TRUE. While InEscSeq, read a char, if it's not '>' throw it away. If it is '>', now InEscSeq is FALSE, read chars and copy to output until InEscSeq is TRUE again.

    It's pretty elementary. AutoIt3 may be easiest or vbscript.

    edit: AutoIt3 is free and easy to learn. If you go here and tell them what you are trying to do, somebody will likely point you in the right direction:

    http://www.autoitscript.com/forum/index.php?showforum=2


    It's a great language to pick up for Windows even if you've never done any programming. Very simple syntax.
    Last edited by MilesAhead; 2nd Oct 2010 at 19:01.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  17. BTW
    In Subtitles Modifier can use Remove comments option.

    VideoAudio.pl - Serwis o technologii wideo & audio
    Quote Quote  
  18. Member
    Join Date
    Feb 2001
    Location
    europe
    Search Comp PM
    Originally Posted by Placio74 View Post
    BTW
    In Subtitles Modifier can use Remove comments option.

    Hey thanks Placio74!

    Its always nice to find new software

    Don't suppose that one has a command line interface as well?
    Quote Quote  
  19. Member AlanHK's Avatar
    Join Date
    Apr 2006
    Location
    Hong Kong
    Search Comp PM
    Originally Posted by MilesAhead View Post
    If the only items you need to remove are tags that have less than '<' for tag start, greater than '>' for tag end, and stuff between, it should be simple to do with a sed or awk script. A one liner should do it. The only thing that may mess it up is if there's actually a '<' or '>' in the dialog.. which seems unlikely.

    There is a > in every SRT timecode:
    1
    00:00:22,694 --> 00:00:26,779
    HUMANS, MUTANTS OF NEW YORK AND
    ELSEWHERE, SAY NO TO SYNTHETIC FLESH

    So your script needs to be a bit smarter.
    Usually the only codes are italics and bold, so just deleting all <i> <b> </b> and </i> should be sufficient. Do a global search for "<" to see if there are any others.

    But I prefer to do it semi-manually.
    I first delete all <i> at the beginning of a line, then review the remaining ones. Some of these should be converted to quote marks (eg, name of a magazine). Then delete the rest and any trailing </i>.
    Last edited by AlanHK; 15th Nov 2014 at 21:23.
    Quote Quote  
  20. Originally Posted by AlanHK View Post
    Originally Posted by MilesAhead View Post
    If the only items you need to remove are tags that have less than '<' for tag start, greater than '>' for tag end, and stuff between, it should be simple to do with a sed or awk script. A one liner should do it. The only thing that may mess it up is if there's actually a '<' or '>' in the dialog.. which seems unlikely.

    There is a > in every SRT timecode:
    1
    00:00:22,694 --> 00:00:26,779
    HUMANS, MUTANTS OF NEW YORK AND
    ELSEWHERE, SAY NO TO SYNTHETIC FLESH

    So your script needs to be a bit smarter.
    Not really. If you read the pseudo code, when '<' is encountered, escape sequence state becomes TRUE. Only while it's TRUE will encountering '>' return it to FALSE. The idea was not to produce an optimized program, but one a non-programmer could grasp right away. That's why if you look at many programming books they start out with file handling examples like copying a file such as
    while(c = readchar(infile))
    writechar(outfile,c)

    Nobody who has programmed longer than a day actually writes code like that. It's to get the idea across.

    edit: in any case the ideal way would be streaming edit so that the files can be handled in batch or using globbing. Substitution expressions aren't my forte. That's why I didn't stick with Perl. But someone good with Linux stream tools like sed or awk or perl could probably knock this out with a one-liner. I'm not going to kill myself to do someone else's research. If you read and search you can find it if not write it yourself.
    Last edited by MilesAhead; 12th Oct 2010 at 23:09.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  21. Member AlanHK's Avatar
    Join Date
    Apr 2006
    Location
    Hong Kong
    Search Comp PM
    Originally Posted by MilesAhead View Post

    Not really. If you read the pseudo code.
    Well, I didn't work through it, but I'll take your word for it.
    I could probably read real code easier than an explanation.

    In any case, there is rarely anything other than simple italic and occasionally bold, anything else and you might want to check it out rather than deleting it sight unseen.
    Quote Quote  
  22. Originally Posted by AlanHK View Post
    Originally Posted by MilesAhead View Post

    Not really. If you read the pseudo code.
    Well, I didn't work through it, but I'll take your word for it.
    I could probably read real code easier than an explanation.

    In any case, there is rarely anything other than simple italic and occasionally bold, anything else and you might want to check it out rather than deleting it sight unseen.
    I have a better idea. You write the real code and I'll critique it!
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  23. Banned
    Join Date
    Oct 2004
    Location
    Freedonia
    Search Comp PM
    If you know what you are doing you can use an editor like Word, WordPad, etc. to remove these codes pretty easily. That's how I do it. It's a simple find and replace edit where you replace with empty space.
    Quote Quote  
  24. This sed one-liner is worth a try. It's supposed to filter html tags, bit it looks like it will remove
    anything starting with '<' and ending with '>"

    # remove most HTML tags (accommodates multiple-line tags)
    sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

    I'm not going to install sed to try it. The other thing is, people add whatever they want to try adding on to srt subs. So in addition to tags you may have placement escape sequences. Like a backslash with a number. So they can be hacked with just about anything. The person who needs to filter the extraneous stuff needs to do the trial and error to tune the filter for his particular input set. There's no way to know for sure what will be in the lines.

    Here's a bunch of sed ine liners:

    http://sed.sourceforge.net/sed1line.txt

    You can get free sed for Windows.
    You can find tutorials.
    My hand holding is done.
    Good luck.

    edit: ok, I have to concede due to the differences with Windows command line it's not as easy as it should be. If you have use for the Linux utilities then it's worth it to install a shell that can handle the scripts with the Linux syntax. But for one application it's a bit much to plow through.

    I have a better solution. As others have noted, Regular Expressions should be part of the answer. But we want a command line driven program so the OP can process in batch. AutoHotKey has regex support and this forum's moderator is very adept in AHK. Also he likes to whip up custom small tools by request(which is what the forum is about.) They call it Coding Snacks. Small programs to order.

    Please read the rules for posting and post exactly what you would like the program to do:

    http://www.donationcoder.com/Forums/bb/index.php?board=31.0

    It shouldn't take skwire, or another volunteer, long to code it. I'm not great at regex so stuff I'd do with 8 passes they can probably knock out with one substitution string. They don't charge for creating programs but donations are appreciated.
    Last edited by MilesAhead; 14th Oct 2010 at 19:24.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  25. While you are waiting for skwire to code you something thorough with more options, this script
    should at least get rid of <x> and </x> tags.

    For best results get AutoHotKey and compile it to .exe.
    Or you can just run the script.

    edit: I changed the script so that the "working directory" is whatever directory it was launched from. If the command prompt was at c:\temp then c:\temp is where it will look for files. That way you can just copy it to a folder in your path.

    So if this .exe works you should be able to batch convert with a command line like

    for %s in (*.srt) do SrtStrip %s

    Code:
    #NoEnv
    #NoTrayIcon
    SendMode Input
    
    If 0 < 1
    {
    	MsgBox, 64, SrtStrip Usage, SrtStrip Copyright (c) 2010 www.FavesSoft.com`n`nUsage: SrtStrip filename`n`nOutput is filename with .strip extension stripped of <> tags
    	ExitApp
    }
    
    InFile := %0%
    OutFile := InFile . ".strip"
    Loop, read, %InFile%,%OutFile%
    {
        result := RegExReplace(A_LoopReadLine,"<.>")
    	result := RegExReplace(result,"</.>")
    	FileAppend,%result%`n
    }
    Last edited by MilesAhead; 15th Oct 2010 at 13:34.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  26. Or you can just download the compiled .exe from my page:

    http://www.favessoft.com/downloads.html

    SrtStrip 1.0

    It's free for you to use at your own risk.
    See included Readme.txt for caveats.

    edit: download the latest version.
    The substitution for each line now takes place in a single pass.
    Told you I was lousy at RegEx. I have to trial and error it
    all the way.
    Last edited by MilesAhead; 16th Oct 2010 at 12:48.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  27. btw here's the new source in case you want to modify it to filter additional tag types or macros:

    Code:
    #NoEnv
    #NoTrayIcon
    SendMode Input
    
    If 0 < 1
    {
    	MsgBox, 64, SrtStrip Usage, SrtStrip Copyright (c) 2010 www.FavesSoft.com`n`nUsage: SrtStrip filename.srt`n`nOutput is subtitle filename.srt.strip stripped of <> tags
    	ExitApp
    }
    
    InFile := %0%
    OutFile := InFile . ".strip"
     IfExist,%OutFile%
        FileDelete,%OutFile%
    	
    Loop, read, %InFile%,%OutFile%
    {
        result := RegExReplace(A_LoopReadLine,"</?.>")
    	FileAppend,%result%`n
    }
    Edit: If you go to my page shown in sig and navigate to Download page, you can get SrtStrip 1.31. The zip includes AHK_L source code, exe, Readme.txt and custom icon. I've made some improvements. The output file has "_strip" tacked onto the base filename of the input file, rather than changing the .srt extension. It's no longer limited to single character tags. Also it removes curly brace tags that are often used to highlight lyrics and replaces them with double greater than characters to denote translated lyric lines.

    Edit2: I uploaded a new zip. I forgot to include the ahk include file in the zip. Now it should have everything needed to compile if you get AHK_L scripting language(it's free.)
    Last edited by MilesAhead; 4th Sep 2012 at 23:03. Reason: new version available
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  28. Member skoville's Avatar
    Join Date
    May 2012
    Location
    Silver Springs, NV.
    Search Comp PM
    Originally Posted by träskmannen View Post
    Open the *.srt in Subtitle Workshop, select all subtitles (<CTRL> + A), right-click and you will get a pop-up menu which should be able to help you.
    I use SWS to check sync. Had no idea it would remove tags. Always used an on line remover but couldn't remember the name. SWS took about 2 seconds. Thanks.
    Quote Quote  
  29. Member AlanHK's Avatar
    Join Date
    Apr 2006
    Location
    Hong Kong
    Search Comp PM
    Originally Posted by skoville View Post
    I use SWS to check sync. Had no idea it would remove tags. Always used an on line remover but couldn't remember the name. SWS took about 2 seconds. Thanks.
    It's a new feature, since this thread was started 4 years ago.

    Another thing to look out for: often people "tag" subs with an eloborate signature using colour tags. So looking for those can let you get rid of that too.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!