VideoHelp Forum

Try DVDFab and download streaming video, copy, convert or make Blu-rays,DVDs! Download free trial !
+ Reply to Thread
Results 1 to 8 of 8
Thread
  1. Member
    Join Date
    Sep 2005
    Location
    Darkest Peru
    Search Comp PM
    I occasionally get subtitles that are TX3G format (or even closed caption extractions), that contain position information as spaces and assumes monospacing of the characters.

    e.g.,
    Code:
    24
    00:01:43,736 --> 00:01:45,671
    PRACTICE STARTS
    IN 15 MINUTES!
    
    25
    00:01:45,738 --> 00:01:46,905
        BYE.
                        BYE.
    I've tried extracting with MP4BOX to .SRT and ffmpeg as .ASS/.SSA. They just keep the spaces but do nothing about it.
    Is there a method for extracting or converting these types of subtitles converting the spaces to X-position codes and removing the spaces?

    Most software/hardware players ignore these spaces and blindly center all text without positional information.
    Last edited by doctorm; 13th Jan 2016 at 18:57.
    Quote Quote  
  2. Member Budman1's Avatar
    Join Date
    Jul 2012
    Location
    NORTHWEST ILLINOIS, USA
    Search Comp PM
    If you convert to SSA format '\h' can be substituted for multiple spaces to maintain a hard space and returns can be replaced with '\N' as referenced in the SSA guide:
    http://moodub.free.fr/video/ass-specs.doc
    And .ASS tags at:
    http://docs.aegisub.org/manual/ASS_Tags (thats ASS<Underscore>Tags)

    A simple text editor that allows searching for multiple spaces and repklacing with these characters should be fairly easily accomplished.

    Some overrides automatically apply to ALL the text - currently this is just alignment overrides, but more may be added later (eg. Shadow/outline depth overrides).

    \h Hard space not to be broken
    \n New line (carriage return)
    \n is ignored by SSA if “smart-wrapping” is enabled
    eg. This is the first line\nand this is the second

    \N New line (carriage return). This is used by SSA instead of \n if
    “smart-wrapping” is enabled.
    This will allow subtitles to be interpreted as:

    Dialogue: Marked=0,0:00:12.00,0:00:15.00,MainB,,0000,0000,00 00,!Effect,{\a5}{\c&H80FF00&}Hello.
    Dialogue: Marked=0,0:00:12.00,0:00:15.00,MainB,,0000,0000,00 00,!Effect,{\a5}{\c&HE0E0E0&}\h\h\h\h\h\h\hOh, hi.\N\h\h\h\h\h\h\h\h\h\h\h Bye.

    Click image for larger version

Name:	ScreenHunter_187 Jan. 13 22.32.jpg
Views:	204
Size:	87.2 KB
ID:	35204
    Quote Quote  
  3. Member
    Join Date
    Sep 2005
    Location
    Darkest Peru
    Search Comp PM
    So the lines in the OP would go from:
    Code:
    Dialogue: 0,0:01:43.74,0:01:45.67,Default,,0,0,0,,PRACTICE STARTS\NIN 15 MINUTES!
    Dialogue: 0,0:01:45.74,0:01:46.91,Default,,0,0,0,,    BYE.\N                    BYE.
    To:
    Code:
    Dialogue: 0,0:01:43.74,0:01:45.67,Default,,0,0,0,,{\a5}PRACTICE STARTS\NIN 15 MINUTES!
    Dialogue: 0,0:01:45.74,0:01:46.91,Default,,0,0,0,,{\a5}\h\h\h\hBYE.\N\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\hBYE.

    More or less by just left justifying everything and then search/replacing every two spaces with "\h\h"?
    Edit: Hmm. Not exactly. Tried \h\h\h\h for every two spaces, but I think the font needs to still be monospaced to keep from looking wrong.

    Edit 2: Okay, loaded in Aegisubs, Left bottom justified the style and set the font to Lucida Console.
    Search and replaced all double spaces as (\h\h). Saved it.

    Loaded in Notepad and searched for (comma comma space) and replaced with (comma comma /h) for any line with one space only.
    Searched and replaced all (\h space) with (\h\h) for all cases where there were an odd number of spaces before the first word in the line.

    Looks fairly correct. A little sloppy in the script, but fine in the software player. Will try hardware next.
    Thanks for the help.
    Last edited by doctorm; 14th Jan 2016 at 18:58.
    Quote Quote  
  4. Member Budman1's Avatar
    Join Date
    Jul 2012
    Location
    NORTHWEST ILLINOIS, USA
    Search Comp PM
    It's hard to tell how many spaces you have in the sample due to the font but basically, yes, You should be able to search and replace every occurrence of multiple spaces with \h. In other words search for <space><space> and replace with <\h><\h> . (do not include the <> marks). You would only search the dialogue area for this but in theory it would be the only area with double spaces anyway.

    The new line '\N' character may have to be searched as the invisible special character <CRLF> in the dialogue only and replaced with <\N>.

    Thanks
    Quote Quote  
  5. Member
    Join Date
    Sep 2005
    Location
    Darkest Peru
    Search Comp PM
    Using YAMB (an Mp4box GUI) to extract the subs added the \N automatically.
    But great ideas. Really made this work right.
    Quote Quote  
  6. Member
    Join Date
    Sep 2005
    Location
    Darkest Peru
    Search Comp PM
    I hate to revive this thread, but I'm trying to do this again, but for the life of me I can't remember how I extracted the subs as .ass with ffmpeg.
    Quote Quote  
  7. Member Budman1's Avatar
    Join Date
    Jul 2012
    Location
    NORTHWEST ILLINOIS, USA
    Search Comp PM
    If extracting normal subtitles (not such as DVDSub subtitles) use:

    Code:
    ffmpeg -i "C:\Users\Bud\Desktop\[dp]_Shinobi.mkv" -vn  -an -map 0:1 -c:s ass "C:\Users\Bud\Desktop\[dp]_Shinobi_2.ass"
    It works as shown below on a video with only video and subtitle track, no Audio. The format is the same just change the '-map 0:1' to whatever subtitle track you want.
    E.G. Video, Audio, subtitle 1 -> -map 0:2
    E.G. video,audio,subtitle 1, subtitle2 -> -map 0:3

    Click image for larger version

Name:	ScreenHunter_190 Mar. 23 23.12.jpg
Views:	134
Size:	147.9 KB
ID:	36303
    Quote Quote  
  8. Member
    Join Date
    Sep 2005
    Location
    Darkest Peru
    Search Comp PM
    Thanks, that seems to be what I needed. I use ffmpeg so infrequently that I never remember how do use it when I come back.
    Quote Quote  



Similar Threads