VideoHelp Forum
+ Reply to Thread
Results 1 to 24 of 24
Thread
  1. On recommendation of another forum member, I've started playing with ccextractor for Windows. It's clunky, but it's fast, and free.

    Two things:

    First, I'm having trouble with timings. I've tried all of the options in the "help", but the best I can get is the subs extracted from the CC data start out OK, but they get REALLY out of time the farther the MPG plays. Can anyone assist with this?

    Second, I'd like to rig up some way to make this more of a streamlined process, and not have to go to the command line to use the app. I've thought about writing a batch file (once the timing issues are fixed) but as the MPG file changes every time the app is used, that's useless. A batch file would be OK for processing several MPG files in a row, but it would take longer for me to write it! Has anyone designed a frontend for ccextractor?

    Other info:

    Using Subtitle Workshop to check timings. System is Win XP Pro. Length of MPG is 43.5 minutes.
    Quote Quote  
  2. What type of file are you starting with, an MPG from a Hauppage card? SFAIK, this is about the only way to get an MPG with embedded CC.

    I used to use the ATI vcr file, then went to converting the txt file generated seperately with the MPG.

    What sort of timing issues are you seeing? Sounds like you are describing a creeping delay, IOW the subs are gradually and increasingly early or late. The only time I had such an issue it was temporary and apparently was related to source and not hardware or software.

    The main issue I run into is the extremely short interval between CC lines, sometimes even overlapping. This would cause sections to fail to display, but I have never consistently had time-synch problems.
    Quote Quote  
  3. Just noticeced you are using Subtitle Workshop to check timings. Are you using FF or RW? I have consistently and repeatedly found that this causes a de-synch of display that is not really there. Same file played continously without FF or RW, or burned to disk and played on a standalone, shows no de-synch at all.

    On the standalone, FF and RW is fine, synch is maintained. From other reports, this seems to happen ONLY with captured MPG, NOT with re-encoded MPG.

    Have you double-checked whether the subs maintain synch but it is the AUDIO which apparently drifts? I found this to be the case on at least some vids but cannot verify that it is the source of the problem. I can verify that many software players show this effect of audio de-synch with captured MPG, especially with FF and RW. A de-mux and re-mux may solve this.

    Only simple solution I have found is just to let the file play, checking periodically for synch.

    I have not checked this last in detail, but usually I can Fast Forward ONE TIME without issues. YMMV.
    Quote Quote  
  4. Ok. Sorry for the omissions.

    The MPG is generated via a Panasonic set-top box DVD Recorder, recorded to a DVD-RW. I import them to the 'puter to edit & author 'em, then reburn with a top quality TY -R or Verbatim +R DL, depending on the project. The Pana keeps the CC info just fine. I'm extracting the CC's after editing with MPG-VCR.

    Yes, I'm jumping ahead in the video to check the subtitle timings, as I just can't sit there and wait through an hour or so's worth of video just to check the timings. In previous burns, my audio is in perfect sync with the video, and I'm not doing anything different now, except extracting the CC info to use as subs.

    OK. I'll take a chance, and re-extract the CC's with the best sync (the original method I tried). We'll burn a -RW and see how it does.

    Any info or suggestions on the frontend / simplification issue?
    Quote Quote  
  5. The only reliable method I have found is to author with the subs, then play the DVD files before burning. Synch is locked in with this method. Or just let the Workshop play the file without FF, do something else and just check it periodically.

    Whatever causes this is related to the difference between a captured file, and a re-muxed and/or authored one. Only suggestion would be to try different players, long ago when dealing with audio synch only I found that WinDVD or Vdub were the only players which would maintain synch when scrubbing forward or backward. Since doing CC to subtitles, the same issue has returned but now I need the player to do subs, as well.

    Once I found that synch was being maintained after authoring, this showed that the de-synch displayed in Subtitle Workshop (and other progs) was not really there. I went the other way and automated the edit process for the CC data, detail checking the method was very tedious but checking each individual file just not easily feasible.

    On at least two occassions, I determined that the seemingly out-of-synch subs were actually displaying correctly, but it was the audio which had "drifted" somewhat. Again, this indication was false as after authoring, synch was locked with no issues. Many headaches were induced.
    Quote Quote  
  6. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    You should always check sync with a software based DVD player. Nothing else that I've found can guarantee the sync.

    What exactly does "REALLY out of time" mean, time-wise?

    Also, you do realize that the CCs are timed at when they are supposed to start loading (and NOT when they are supposed to be displayed). They actually are displayed when an {EOC} is reached.
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  
  7. SLK, could you expand a bit on the timing issue? I would think the delay between loading and displaying would be very minimal, and what is an EOC and how can I determine where one is?

    Would this explain why I often get a Start Time for a particular line that is identical to or even before the End Time of the previous line? Sounds like all Start Times from the CC file should actually be at a slightly later point? This overlap has given me real headaches, if for instance all start times could just have 1 or 2 tenths of a second added for display as subtitles, ooooh. This is pretty much what I've been doing but on an "as needed" basis.

    I'm actually working with a txt file generated from the CC info during capture, very similar but different format.
    Quote Quote  
  8. Member
    Join Date
    Jul 2005
    Location
    USA
    Search Comp PM
    I Have never had a timing problem with CCextractor. If the Input/output in Subtitle Workshop is set to anything other than 29.97 this will cause timing problems. I always check the sync with PowerDVD with both CC and subtitles turned on, they are always in perfect sync. You need to use IFOedit to set a software flag for PowerDVD to display the CC.
    Quote Quote  
  9. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    As for timing, each frame has two bytes of data - most characters are a single byte and most control characters are two bytes. A typical line of CC code (one real line of code is shown below) has some minimum requirements. First, the code {RCL}{RCL} (Resume Caption Loading) tells the caption controller to resume loading transmitted captions. Second, the code {1400}{1400} is the positioning information - where on the screen the caption will begin display. Next comes the actual displayed characters, followed by {1500}{1500}, which is the positioning of the second line of this particular caption. Following that, is the displayed characters for the second line. Next comes {EDM}{EDM} (Erase Displayed Memory) which causes the caption controller to erase the currently displayed caption. Finally, the control characters {EOC}{EOC} (End Of Caption), tells the caption controller that all of the caption has been received and to now display the caption.

    Now if you add up all of this caption, you'll see that it will require 34 frames to load all of the caption info. That means that the caption loading MUST start 34 frames prior to the desired display time. At 30 frames a second, this is well over a second. The timing information shown in the caption (00:02:49:15) is actually 34 frames prior to the desired display time.

    Now most of the time, this preloading is not really noticeable, unless you are really looking for it. However, in times where the dialog comes fast and furious, the delay can be really distracting.

    Note that it is a requirement that all control characters are to be transmitted in duplicates.

    00:02:49:15 {RCL}{RCL}{1400}{1400}REPORTS ARE COMING IN FROM ALL{1500}{1500}OVER THE EMPIRE,{EDM}{EDM}{EOC}{EOC}
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  
  10. Ok, folks. Looks like y'all were right - Subtitle Workshop DOES mess a little with FF and REW, and the timings are correct, after all. I went ahead and burned a test disc, which was fully authored and complete, onto a -RW.

    The only issues I have now:

    The captured CC info has lost it's ALL CAPS. It now has normal capitalization. Annoying as hell, but I can live with it, if there's no other solution.

    The burned disc's subtitles were TOO SMALL. This has been fixed.

    Italics and special characters have disappeared somewhere, or have been altered. It's strange, though. In Subtitle Workshop, the italics are there, but the special characters (like musical notes in the CC data) are altered to the paragraph symbol. However, if I load the exact same file, with no alterations, into my authoring application (DVD-Lab Pro 1.53), the italics are totally missing, and the musical notes are still paragraph symbols. I can fix the italics by going through the subs line-by-line in DLP, cross-checking it with the same file loaded into Subtitle Workshop. However, the musical notes and other special characters are a problem I don't know how to fix.

    FYI: I'm using WST_ENG as the font, as it is the same as the CC font. I've tried using others as well, but the characters and italics show no changes. I'm going with 22 as the font size now, to fix the small text issue when played on the TV.
    Quote Quote  
  11. Thanks SLK for that explanation. So a straight CC to Subtitle conversion should be delayed approx 1 second per line.

    The musical notes I'm pretty sure are supposed to be stripped out when CC are converted to subtitles, these along with the positioning information are lost. Most sound effects cues are supposed to be removed, as well, IIRC these are usually left in when converted.
    Quote Quote  
  12. Also just curious, how many spelling errors and/or missing characters are you seeing? Rough estimate.

    I found running them through MS Word spell check was generally worthwhile. Though anything with a lot of proper names and especially foreign or alien language was a royal pain.
    Quote Quote  
  13. No spelling errors, yet. It's mostly all formatting errors.

    For example:

    Open the .srt into Subtitle Workshop. First few lines are italicised, then skip down a couple of minutes to the TV show's main title sequence. There's the show's song lyrics there, which are supposed to be preceded by a musical note, as well as ending each line with another note. Also, during the show, there are lines which should be partially italicised, to show someone speaking off-camera (standard convention).

    Now, all of this stuff shows just fine in Subtitle Workshop now (once I replace the junk characters for the real musical notes. I mistakenly thought that WST_Engl font was the CC font. I now have the REAL CC font installed off of another computer I own, and I've started replacing the musical note characters via the Character Map utility in Windows).

    However, once the file is loaded into DLP, everything format-wise is lost. Italics? Bye-Bye. Color text? Forget it. The only way to fix this is to once again go through the entire subtitle file manually line-by-line and compare it with the same file loaded in SW. Then fix accordingly. This is stupid, and beyond annoying. Am I maybe saving the file in SW in the wrong format? I've tried saving in both .srt and .sub formats.
    Quote Quote  
  14. I know the musical notes are a difference between CC and subtitles generally and are supposed to be removed. Sound effects are also included in CC and generally not in Subtitles. The explanation was CC is intended for the hearing-impaired and subs are meant for foreign language speakers.

    SFAIK, color and positioning are supported by the SRT standard but not used in any known software. Italics I honestly don't recall if they are retained or not.

    Other formats do support some of these but I think once you go to SRT, even if the characteristics are visible in Subtitle Workshop they are stripped out upon any conversion. Most authoring softwares will choke on any formatting characters in an SRT stream.

    Interesting you don't see missing characters, I get a fair amount of these using either the Bin or Txt methods with my ATI card. Sound like you are reviewing All lines, they're not so obvious on a quick check. Spell-check on a 90-minute movie shows several dozen, on average. Usually just one or two characters dropped out.

    How about capitalization, both for first word of a sentence and proper or place names? Seems about 50-50 for me on names, maybe 80-20 on sentences.
    Quote Quote  
  15. ccextractor works for me with a Hauppage card but it repeats the lines of closed captions in every entry. How can I solve that problem?
    Quote Quote  
  16. From what I've seen, capitalization is pretty darn good so far. Names are 100% so far, which is far beyond what I was expecting. Sentence beginning is also 100%. However, keep in mind I've done 3 lousy TV shows so far with this app.

    I don't know that the capture card or method matters all that much - it should either capture the line 21 or not. I think it has much more to do with the application, and the show/movie.

    After a little more research, I decided to upgrade to the latest version of DLP - 2.34, I believe it is. With this, one can just insert the line 21 CC data back into the video, and you're good to go. Can't do this in 1.53, which is what I was using. Just gotta figure out how best to extract the CC data now - I'm thinking just do a RAW extract for the line 21, then either convert it to SRT and then import it for subtitles or do a second extraction as SRT.

    Yes, it's a bit overkill, but I want the option of having either CC or subs (from the same data) to be played on ANY TV. I've seen some TV setups that strangely, the DVD signal somehow bypasses the caption decoder, so TV plays CC data fine, but DVD doesn't.

    FYI: SRT/SubRip standard: Underline, italics, and color formatting are converted, but not flash (which no subtitle format supports). (From the SCC_Tools documentation). So does SCC (of course), SMI, SSA, and ASS. SUB, PSB, and TXT do not support any formatting.
    Quote Quote  
  17. Banned
    Join Date
    Jun 2007
    Location
    UNREACHABLE
    Search Comp PM
    Neisse wrote:

    I've seen some TV setups that strangely, the DVD signal somehow bypasses the caption decoder, so TV plays CC data fine, but DVD doesn't.
    My old Toshiba is a good example of what you have just said.
    Quote Quote  
  18. Member
    Join Date
    Jul 2005
    Location
    USA
    Search Comp PM
    Polarwhite, you may be converting rollup CC. If that is the case there is a lot of duplication and this type of CC is almost impossible to convert to a decent subtitle. There are generally two types of Closed Captions (CC) which are transmitted and they generally fall into two distinct categories. Live broadcasts (news, sports)always use Rollup CC and prerecorded shows invariably use Popup CC. Rollup CC has a window which is continuously scrolling with the dialogue. This type of CC is VERY difficult to change to a subtitle file. The Popup CC is very easy to convert and has very few spelling errors. CC can also be in upper case only or lower case. Italics is used to denote a speaker not in the scene or a commentator.

    I use Subtitle Workshop to change all uppercase to lower case and to delete a lot of the dialogue unique to CC. In Subtitles Workshop you can select Tools/Information & Errors/Information & Errors then click on Fix Errors and it will correct most of the errors. The music notes show up as paragraph symbols I just leave them as is. Italics will not show up in the finished subtitle.
    Quote Quote  
  19. BobT:
    Thank you so much. I appreciate a lot your help in this matter.
    Quote Quote  
  20. Member
    Join Date
    Apr 2007
    Location
    Spain
    Search Comp PM
    Originally Posted by BobT
    Polarwhite, you may be converting rollup CC. If that is the case there is a lot of duplication and this type of CC is almost impossible to convert to a decent subtitle.
    Actually I'm resuming work on ccextractor (someone sent me a bunch of patches and is working with me so I have a motivation) and since a usable roll-up compatible transcript mode is one of the most requested features, it's going to be included in the next version.

    What I need though is some samples to work with. I don't need the whole video files, the .bin files (generated by ccextractor) would do for this. If someone can help with the samples and/or feels like testing the new version, email me at the address shown in ccextractor itself.
    Quote Quote  
  21. Member
    Join Date
    May 2008
    Location
    Colombia
    Search Comp PM
    Hi. I'm trying to extract the closed captions of a DVD. With DVDDecrypter I extract the m2v file and run ccextractor. The output is as follows:

    C:\THE_SC~1\VIDEO_TS>c:\windows\temp\ccextractor.0 .39\windows\ccextractorwin.exe
    -srt -fp "VTS_01_1 - 0xE0 - Video - MPEG-2 - 720x480 (NTSC) - 4~3.M2V" -o ep1.s
    rt
    CCExtractor 0.39, cfsmp3 at gmail
    -------------------------------------
    Input: VTS_01_1 - 0xE0 - Video - MPEG-2 - 720x480 (NTSC) - 4~3.M2V
    [Raw Mode: Broadcast] [Extract: 1] [Stream mode: Auto] [Use MythTV code: Auto]
    [Debug: No] [Buffer output: No] [Buffer input: Yes]
    [Autopad: Yes] [GOP pad: No] [Print CC decoder traces: No]
    [Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Input type: MPEG]
    [Add font color data: Yes] [Convert case: No] [Video-edit join: No]
    [Extraction start time: not set (from start)]
    [Extraction end time: not set (to end)]
    Creating ep1.srt

    -----------------------------------------------------------------
    Opening file: VTS_01_1 - 0xE0 - Video - MPEG-2 - 720x480 (NTSC) - 4~3.M2V
    File seems to be an elementary stream, enabling ES mode
    Analyzing data in general mode


    New video information found
    [720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97]

    100% | 09:49

    Total frames time: 00:58:48:561 (105751 frames at 29.97fps)

    Initial GOP time: 00:00:00:000
    Final GOP time: 00:00:00:000 +1F
    Diff. GOP length: 00:00:00:000 +1F (00:00:00:033)

    Total length according to CC blocks: 00:09:49:322
    corrected for number of pics in last GOP: (00:00:00:033)

    Number of likely false picture headers (discarded): 0
    Done, processing time = 591 seconds
    Performance (real length/process time) = 0.00
    This is alpha software. Report issues to cfsmp3 at gmail...

    The extracted text is ok, but the timing is wrong; since the video file is 58 mins long, the subtitle file only reaches 9 minutes.

    I would appreciate any help. Thanks
    Quote Quote  
  22. Member
    Join Date
    Apr 2007
    Location
    Spain
    Search Comp PM
    Originally Posted by juancasilva
    C:\THE_SC~1\VIDEO_TS>c:\windows\temp\ccextractor.0.39\windows\ccextractorwin.exe
    -srt -fp "VTS_01_1 - 0xE0 - Video - MPEG-2 - 720x480 (NTSC) - 4~3.M2V" -o ep1.s
    rt

    The extracted text is ok, but the timing is wrong; since the video file is 58 mins long, the subtitle file only reaches 9 minutes.
    Is it possible that DVDDecrypter split the video in several files? What I see is fine, I'd say that you just processed the first part of a series of video files.

    Show us what's in C:\THE_SC~1\VIDEO_TS and we'll find out
    Quote Quote  
  23. Member
    Join Date
    May 2008
    Location
    Colombia
    Search Comp PM
    Well... I just resolve the problem using the vobsub subtitle ripper wizard; it extracts all the captions and the timing is ok.

    The DVDDecrypter ripped all the dvd video file (58 minutes long), the content of the directoy c:\the_sc~1\video_ts is the m2v file only. Further tests may be ripping each 1Gb file and extracting the captions.

    Thanks for your help.
    Quote Quote  
  24. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    Originally Posted by juancasilva
    Hi. I'm trying to extract the closed captions of a DVD. With DVDDecrypter I extract the m2v file and run ccextractor. The output is as follows:

    C:\THE_SC~1\VIDEO_TS>c:\windows\temp\ccextractor.0.39\windows\ccextractorwin.exe
    -srt -fp "VTS_01_1 - 0xE0 - Video - MPEG-2 - 720x480 (NTSC) - 4~3.M2V" -o ep1.s
    rt


    The extracted text is ok, but the timing is wrong; since the video file is 58 mins long, the subtitle file only reaches 9 minutes.

    I would appreciate any help. Thanks
    I suspect that you only have a portion of your video, and CCX is extracting all that is there. IIRC, CCX (what I renamed the file to) can span multiple .VOBs, but you have to specifically call them out on the command line. I don't remember exactly the syntax, but it is something like this:

    C:\ccx -srt -fp VTS_01_1.VOB VTS_01_2.VOB VTS_01_3.VOB -o mycc.raw

    The syntax is readily available by simply entering "ccx" on the command line.
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!