VideoHelp Forum




+ Reply to Thread
Results 1 to 10 of 10
  1. Member
    Join Date
    Jul 2007
    Location
    Canada
    Search Comp PM
    Hi all,

    I am in need of a way to extract the text of a close caption file and put it into a normal text file without the time code.
    I want to be able automatically create a transcript of the clip minus timecode and repetitive characters.

    Is there any way to do this?

    Thanks a bunch
    Cheers
    Jim
    Quote Quote  
  2. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    First, go here: http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML
    and get this package and learn how to use it. Also, be sure to get Carlos Fernandez's CCExtract tool (this will rip the raw data - rather quickly, too).

    Once you have the text file with the timings, use XVI32 (a HEX editor with wild card delete) to search and delete "..:..:..:.. " to get rid of all the timings (in this case, the period "." is the wild card).

    Now search for "{.}", then "{..}", then "{...}", then"{....}", etc, then delete to get rid of all the "special instruction codes".

    You should now have a text only file, but maybe all in cap letters (most CCs are caps only). Load this into Word and use the "CHANGE CASE" menu to change your file to Sentence type.

    From here, you should be mostly done.
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  
  3. Member
    Join Date
    Jul 2007
    Location
    Canada
    Search Comp PM
    Thanks SLK,

    This solves the timestamps but I'll still have the multi character problem.
    example.

    IS
    ISSU
    ISSUES
    ISSUES RE
    ISSUES REGAR
    ISSUES REGARDING

    Is there an app that will trim the repetitive text? So it will only leave the most complete sentence.
    "ISSUES REGARDING"

    Thanks again.
    Cheers
    Jim
    Quote Quote  
  4. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    What did you use to extract your User Data (CC text)?
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  
  5. Member
    Join Date
    Jul 2007
    Location
    Canada
    Search Comp PM
    I used mpg2srt.

    The content is being fed by a Hauppage card in mpeg format.
    I then run it into mpg2srt and it generates the .srt file.

    Will the CCextract tool provide better results?
    example:

    1
    timestamp
    ISSUES REGARDING
    BLAH BLAH BLAH

    2
    timestamp
    MORE ISSUES

    instead of my results from mpg2srt.

    Thanks
    Jim
    Quote Quote  
  6. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    Originally Posted by Jamesson
    Will the CCextract tool provide better results?
    Probably.

    Just know that this:
    IS
    ISSU
    ISSUES
    ISSUES RE
    ISSUES REGAR
    ISSUES REGARDING

    is being screwed up by the H card. It does NOT exist on the DVD like this.
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  
  7. Member
    Join Date
    Jul 2005
    Location
    USA
    Search Comp PM
    The repeating of words you are seeing with the closed captions is to show a continuously scrolling three line presentation at the bottom of the TV screen. Fortunately this is rarely used for closed captions as it adds costs to the video production. I have seen this mostly on Public service programs. There is no easy way to convert it except to manually go through the text file and delete words that are dupicated. You should find that every fourth line is complete and the three previous lines are incomplete. By the way MPG2SRT is good for just extracting the closed captions but the timelines are based on 30fps and not 29.97fps so you get a time slippage that amounts to about 8 seconds for a 2 hour movie. This slippage will be noticeable after about 15-20 minutes into a video.
    Quote Quote  
  8. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    Bob,
    Are you saying that this is a crawl? I didn't think that this could be effectively done with CCs because they can only start at every fourth position ('suppose that you could use spaces for the fillers, but the text shown doesn't show any spaces).

    As for the 8 second adjustment, I believe that McPoodle's CCADJ with the "-a" switch will allow CCs to be "stretched" or "shrunk".
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  
  9. Member
    Join Date
    Jul 2005
    Location
    USA
    Search Comp PM
    The continuously crawling Closed captions are typically used in live broadcasts with news channels such as Fox and CNN. If you record these broadcasts and pull out the closed captions you will have a lot of reptition. The type of closed captions which do not repeat are the ones which flash on the screen then turn off to be replaced by another caption.
    Mpg2srt has a bad time slippage problem. If you use it to extract closed captions and do not use the time codes then it is not a problem. If you use the Mpg2srt to produce a file which is used to make subtitles and then author this to a DVD you will see the subtitles gradually keep getting further and further out of sync with the audio dialogue.
    All of the TV shows I record, with closed captions. when I place them on a DVD, I also go to the trouble to make the closed captions into subtitles and place on the DVD. The reason is that portable DVD players will only show subtitles! I have used the McPoodle software in the past and it is quite good. I am currently using CCextractor and have had no problems.
    Quote Quote  
  10. Member
    Join Date
    May 2001
    Location
    United States
    Search Comp PM
    Yes, cfsmp3'S CCextractor program is quite good - and accurate. Carlos has made a significant contribution to the art of subtitle and closed caption capturing/authoring.
    ICBM target coordinates:
    26° 14' 10.16"N -- 80° 16' 0.91"W
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!