VideoHelp Forum
+ Reply to Thread
Results 1 to 15 of 15
Thread
  1. Hi. How to convert 3D idx/sub subtitles to 2D so I could OCR it without x2 time consuming. Also when trying to OCR it, the DVDSubExtractor or SubtitleEdit doesn't recognize a single character because subtitle letters are stretched vertically and doesn't look like normal letters/font. So maybe there's a solution for that too for better OCRing. Thanks.
    Quote Quote  
  2. Member Budman1's Avatar
    Join Date: Jul 2012
    Location: NORTHWEST ILLINOIS, USA
    Search Comp PM
    Did a little experimenting and don't know if its the same as in your case but, I created 3D idx/sub subtitles from 2D subtitles and they were indeed elongated as well as duplicated on each frame. I used Subtitle Edit and they OCR'd back to 2D OK but of course there were duplicate entries for each line. I'm not all that familiar with 3D subtitles but I would assume the duplicates are for making them 3D and converting back is going to show both lines in 2D. That doesn't sound like an easy fix except to use an, as yet, unwritten program.
    Name:  ScreenHunter_46 Dec. 22 22.50.jpg
Views: 353
Size:  8.0 KB
    3D subtitles

    Name:  ScreenHunter_46 Dec. 22 22.51.jpg
Views: 356
Size:  7.4 KB
    Back to 2D subtitles
    Quote Quote  
  3. Member johns0's Avatar
    Join Date: Jun 2002
    Location: canada
    Search Comp PM
    Just ocr the subs with subtitle edit and delete the duplicate entries,if that don't work then someone else might know a better way if at all.
    I think,therefore i am a hamster.
    Quote Quote  
  4. So you can convert to 3D, but can't convert back to 2D? And Subtitle Edit is a killer in OCRing my language. It would take 5 hours to get successful subs with it. That's why I always use DVDSubExtractor because it is way much better in OCRing the 2D subs, but sadly not 3D subs, it requires to identify every character and repeating character and same characters from duplicate lines and so over and over again like that with every line, which is just way too much.
    Quote Quote  
  5. And Subtitle Edit is a killer in OCRing my language
    I don't know what language that would be but just to make sure: did you installed the appropriate dictionary/tesseract file?
    Quote Quote  
  6. I did, but it gives waaaay too many errors, wrong characters, it even manages to replace actual letters with symbols for no reason. I've been using Subtitle Edit for years now, but not for OCRing. I only use it to edit .srt that I OCRed from other software.
    Quote Quote  
  7. Member Budman1's Avatar
    Join Date: Jul 2012
    Location: NORTHWEST ILLINOIS, USA
    Search Comp PM
    As I said earlier, it worked on mine but yours msy be especially elongated other other factors. Any chance you can post the idx/sub so we can try. If the subs will ocr then it it possible to write a quick program to parse each line into half its length and thereby leaving only correct subs. Getting them ocr'd is the first step and we need the file to try our favorite ocr utility.
    Quote Quote  
  8. Okay I OCR'ed it, but how do I delete the extra repeating/duplicate texts? Looks it will take manually going through every line.
    Quote Quote  
  9. Member Budman1's Avatar
    Join Date: Jul 2012
    Location: NORTHWEST ILLINOIS, USA
    Search Comp PM
    Okay, just wondered if anyone still needed a quick and dirty method to UN-3D subtitles. Almost done with version below that will parse the double lines but not much error recovery and I'm sure there would be a ton of suggestions. Only for SRT at moment...

    Name:  ScreenHunter_46 Dec. 24 16.19.jpg
Views: 330
Size:  37.8 KB
    Quote Quote  
  10. My OCRed subs are not in double lines, lines splits in random orders. I guess I will remove double texts manually.
    Quote Quote  
  11. Member Budman1's Avatar
    Join Date: Jul 2012
    Location: NORTHWEST ILLINOIS, USA
    Search Comp PM
    Can you give a sample of the OCR lines. I don't understand
    ...subs are not in double lines, lines splits in random orders. ...
    .
    If I had a sample, perhaps I could help automate.
    Quote Quote  
  12. There's a link to .subs/.idx
    Quote Quote  
  13. Member Budman1's Avatar
    Join Date: Jul 2012
    Location: NORTHWEST ILLINOIS, USA
    Search Comp PM
    Okay, piece of cake. I'll try to attach a zip file here with the subs you sent and the parser that I used. You just drag and drop the srt file to the top box and then hit convert. It saves all the number lines and the time lines and parses everything else into 2 panels. If they are the same, it willsave but if they are different, you just correct and/or hit the ones you want (green arrow). I added an auto checkbox for those that are essentially the same except spacing so you don't have to click all the time. It creates a nameNEW.SRT file so your original is safe.

    There were only 366 lines covering the 01:40:00.00 + movie? That's all I found in the sub/idx you sent. Let me know if it works for you as well.
    Thanks
    Budman1
    Attached Files
    Last edited by Budman1; 27th Dec 2013 at 21:56.
    Quote Quote  
  14. I get errors when clicking convert:

    Full log (no key): Pastebin

    Also, yeah it's only 366 lines cuz Arnold never talks a lot in his movies lol. Also your .srt is a mess too lol, I don't even recognize some of those letters, I hate it when it converts letters to different letters/characters. Also no spaces and etc.

    I run on Win8 x64.

    Update: well, I edited the subs manually. So thanks all for help.
    Last edited by rhaz; 27th Dec 2013 at 05:07.
    Quote Quote  
  15. Member Budman1's Avatar
    Join Date: Jul 2012
    Location: NORTHWEST ILLINOIS, USA
    Search Comp PM
    I will check on that error. One quick question if I may... the subtitles appear to be in Lithuanian? What language is your Windows running? That may explain the error you got since mine is english codepage 1252. Thanks

    okay I updated the program and verified by using the Subtitles you sent. I disabled the convert button after the first click so only the left and right okay button work until its done. The link above should have the new one with Date of 12/27 09:53 PM

    I do NOT get any strange characters except the normal one for characters that are part of that language. This is what I get:
    1
    00:00:44,800 --> 00:00:48,521
    GROBUONIS

    2
    00:03:56,320 --> 00:03:59,608
    Neblogaiatrodai.
    - Seniai nesimatém.

    3
    00:04:00,480 --> 00:04:02,562
    Eikévidun.

    4
    00:04:03,480 --> 00:04:09,487
    Prieé 18 val. praradom sraigtasparni
    su ministru ir éios éalies pataréju.

    Translation:
    predator

    Neblogaiatrodai. - Long time no see.

    Eikévidun. 18 hours ago. we lost Helicopters with the Minister and making those country's advisor.

    Is that what you see?
    Last edited by Budman1; 27th Dec 2013 at 22:08. Reason: update links
    Quote Quote  



Similar Threads