Hi. How to convert 3D idx/sub subtitles to 2D so I could OCR it without x2 time consuming. Also when trying to OCR it, the DVDSubExtractor or SubtitleEdit doesn't recognize a single character because subtitle letters are stretched vertically and doesn't look like normal letters/font. So maybe there's a solution for that too for better OCRing. Thanks.
+ Reply to Thread
Results 1 to 15 of 15
Did a little experimenting and don't know if its the same as in your case but, I created 3D idx/sub subtitles from 2D subtitles and they were indeed elongated as well as duplicated on each frame. I used Subtitle Edit and they OCR'd back to 2D OK but of course there were duplicate entries for each line. I'm not all that familiar with 3D subtitles but I would assume the duplicates are for making them 3D and converting back is going to show both lines in 2D. That doesn't sound like an easy fix except to use an, as yet, unwritten program.
Back to 2D subtitles
So you can convert to 3D, but can't convert back to 2D? And Subtitle Edit is a killer in OCRing my language. It would take 5 hours to get successful subs with it. That's why I always use DVDSubExtractor because it is way much better in OCRing the 2D subs, but sadly not 3D subs, it requires to identify every character and repeating character and same characters from duplicate lines and so over and over again like that with every line, which is just way too much.
As I said earlier, it worked on mine but yours msy be especially elongated other other factors. Any chance you can post the idx/sub so we can try. If the subs will ocr then it it possible to write a quick program to parse each line into half its length and thereby leaving only correct subs. Getting them ocr'd is the first step and we need the file to try our favorite ocr utility.
Okay I OCR'ed it, but how do I delete the extra repeating/duplicate texts? Looks it will take manually going through every line.
Okay, just wondered if anyone still needed a quick and dirty method to UN-3D subtitles. Almost done with version below that will parse the double lines but not much error recovery and I'm sure there would be a ton of suggestions. Only for SRT at moment...
My OCRed subs are not in double lines, lines splits in random orders. I guess I will remove double texts manually.
Can you give a sample of the OCR lines. I don't understand...subs are not in double lines, lines splits in random orders. ...
If I had a sample, perhaps I could help automate.
Okay, piece of cake. I'll try to attach a zip file here with the subs you sent and the parser that I used. You just drag and drop the srt file to the top box and then hit convert. It saves all the number lines and the time lines and parses everything else into 2 panels. If they are the same, it willsave but if they are different, you just correct and/or hit the ones you want (green arrow). I added an auto checkbox for those that are essentially the same except spacing so you don't have to click all the time. It creates a nameNEW.SRT file so your original is safe.
There were only 366 lines covering the 01:40:00.00 + movie? That's all I found in the sub/idx you sent. Let me know if it works for you as well.
Last edited by Budman1; 27th Dec 2013 at 22:56.
I get errors when clicking convert:
Full log (no key): Pastebin
Also, yeah it's only 366 lines cuz Arnold never talks a lot in his movies lol. Also your .srt is a mess too lol, I don't even recognize some of those letters, I hate it when it converts letters to different letters/characters. Also no spaces and etc.
I run on Win8 x64.
Update: well, I edited the subs manually. So thanks all for help.
Last edited by rhaz; 27th Dec 2013 at 06:07.
I will check on that error. One quick question if I may... the subtitles appear to be in Lithuanian? What language is your Windows running? That may explain the error you got since mine is english codepage 1252. Thanks
okay I updated the program and verified by using the Subtitles you sent. I disabled the convert button after the first click so only the left and right okay button work until its done. The link above should have the new one with Date of 12/27 09:53 PM
I do NOT get any strange characters except the normal one for characters that are part of that language. This is what I get:
00:00:44,800 --> 00:00:48,521
00:03:56,320 --> 00:03:59,608
- Seniai nesimatém.
00:04:00,480 --> 00:04:02,562
00:04:03,480 --> 00:04:09,487
Prieé 18 val. praradom sraigtasparni
su ministru ir éios éalies pataréju.
Neblogaiatrodai. - Long time no see.
Eikévidun. 18 hours ago. we lost Helicopters with the Minister and making those country's advisor.
Is that what you see?
Last edited by Budman1; 27th Dec 2013 at 23:08. Reason: update links