Hi all,
I am in need of a way to extract the text of a close caption file and put it into a normal text file without the time code.
I want to be able automatically create a transcript of the clip minus timecode and repetitive characters.
Is there any way to do this?
Thanks a bunch
Cheers
Jim
+ Reply to Thread
Results 1 to 10 of 10
-
-
First, go here: http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML
and get this package and learn how to use it. Also, be sure to get Carlos Fernandez's CCExtract tool (this will rip the raw data - rather quickly, too).
Once you have the text file with the timings, use XVI32 (a HEX editor with wild card delete) to search and delete "..:..:..:.. " to get rid of all the timings (in this case, the period "." is the wild card).
Now search for "{.}", then "{..}", then "{...}", then"{....}", etc, then delete to get rid of all the "special instruction codes".
You should now have a text only file, but maybe all in cap letters (most CCs are caps only). Load this into Word and use the "CHANGE CASE" menu to change your file to Sentence type.
From here, you should be mostly done.ICBM target coordinates:
26° 14' 10.16"N -- 80° 16' 0.91"W -
Thanks SLK,
This solves the timestamps but I'll still have the multi character problem.
example.
IS
ISSU
ISSUES
ISSUES RE
ISSUES REGAR
ISSUES REGARDING
Is there an app that will trim the repetitive text? So it will only leave the most complete sentence.
"ISSUES REGARDING"
Thanks again.
Cheers
Jim -
What did you use to extract your User Data (CC text)?
ICBM target coordinates:
26° 14' 10.16"N -- 80° 16' 0.91"W -
I used mpg2srt.
The content is being fed by a Hauppage card in mpeg format.
I then run it into mpg2srt and it generates the .srt file.
Will the CCextract tool provide better results?
example:
1
timestamp
ISSUES REGARDING
BLAH BLAH BLAH
2
timestamp
MORE ISSUES
instead of my results from mpg2srt.
Thanks
Jim -
Originally Posted by Jamesson
Just know that this:
IS
ISSU
ISSUES
ISSUES RE
ISSUES REGAR
ISSUES REGARDING
is being screwed up by the H card. It does NOT exist on the DVD like this.ICBM target coordinates:
26° 14' 10.16"N -- 80° 16' 0.91"W -
The repeating of words you are seeing with the closed captions is to show a continuously scrolling three line presentation at the bottom of the TV screen. Fortunately this is rarely used for closed captions as it adds costs to the video production. I have seen this mostly on Public service programs. There is no easy way to convert it except to manually go through the text file and delete words that are dupicated. You should find that every fourth line is complete and the three previous lines are incomplete. By the way MPG2SRT is good for just extracting the closed captions but the timelines are based on 30fps and not 29.97fps so you get a time slippage that amounts to about 8 seconds for a 2 hour movie. This slippage will be noticeable after about 15-20 minutes into a video.
-
Bob,
Are you saying that this is a crawl? I didn't think that this could be effectively done with CCs because they can only start at every fourth position ('suppose that you could use spaces for the fillers, but the text shown doesn't show any spaces).
As for the 8 second adjustment, I believe that McPoodle's CCADJ with the "-a" switch will allow CCs to be "stretched" or "shrunk".ICBM target coordinates:
26° 14' 10.16"N -- 80° 16' 0.91"W -
The continuously crawling Closed captions are typically used in live broadcasts with news channels such as Fox and CNN. If you record these broadcasts and pull out the closed captions you will have a lot of reptition. The type of closed captions which do not repeat are the ones which flash on the screen then turn off to be replaced by another caption.
Mpg2srt has a bad time slippage problem. If you use it to extract closed captions and do not use the time codes then it is not a problem. If you use the Mpg2srt to produce a file which is used to make subtitles and then author this to a DVD you will see the subtitles gradually keep getting further and further out of sync with the audio dialogue.
All of the TV shows I record, with closed captions. when I place them on a DVD, I also go to the trouble to make the closed captions into subtitles and place on the DVD. The reason is that portable DVD players will only show subtitles! I have used the McPoodle software in the past and it is quite good. I am currently using CCextractor and have had no problems. -
Yes, cfsmp3'S CCextractor program is quite good - and accurate. Carlos has made a significant contribution to the art of subtitle and closed caption capturing/authoring.
ICBM target coordinates:
26° 14' 10.16"N -- 80° 16' 0.91"W
Similar Threads
-
Subtitle Frame Rate Conversion.
By wasimismail in forum SubtitleReplies: 11Last Post: 20th Jun 2009, 08:07 -
Web-based transcript-video w/hyperlinks???
By cannonbrown in forum Newbie / General discussionsReplies: 1Last Post: 10th Apr 2009, 14:00 -
Subtitles quality down after subtitle conversion?
By masadar in forum SubtitleReplies: 2Last Post: 15th Mar 2009, 04:30 -
Subtitle quality after conversion
By cobalt in forum SubtitleReplies: 2Last Post: 4th Aug 2008, 11:29 -
How to create subtitle file from transcript text file
By amagrace in forum SubtitleReplies: 7Last Post: 8th May 2008, 11:44