VideoHelp Forum
+ Reply to Thread
Results 1 to 8 of 8
Thread
  1. Member
    Join Date
    Dec 2008
    Location
    Switzerland
    Search Comp PM
    Hi everyone,

    I'm trying to convert idx/sub file subtitles to .srt file.

    I know that the easiest way is to use SubRip, but the issue is that I'm trying to convert idx/sub files for the languages (Arabic, Farsi, Hebrew).

    Those languages are not supported in SubRip, therefore, I'm trying to convert the subs from idx/sub file into .tif file which can be converted into text using one of the OCR programs like IRIS.

    I've tried to use SubtitleCreator, but the quality of the resulted .tif file wasn't encouraging.

    Any other suggestions?

    Kind Regards,
    Quote Quote  
  2. I'm a Super Moderator johns0's Avatar
    Join Date
    Jun 2002
    Location
    canada
    Search Comp PM
    I think,therefore i am a hamster.
    Quote Quote  
  3. Member
    Join Date
    Dec 2008
    Location
    Switzerland
    Search Comp PM
    Originally Posted by johns0 View Post
    Thanks for your reply johns0.

    Please correct me if I'm wrong.

    I tried to open the sub file in Subtitle Edit and then I chose (Yes) when the dialogue box "Import this VobSub subtitle?" popped up.

    I right clicked on the subtitles and chose "Save all images with html index".

    Now there are two issues:

    1- I need all the subs to be in Black Color with White Background (I know this can be done by manipulating "Image Palette", but I woul really appreciate if someone could guide me how to do it exactly).

    2- The result from this process is one index.html file with several .png files.
    Is that what I should get and how I can merge all those .png files into one .tif file.

    Appreciate your feedback & suggestions.

    Kind Regards,
    Quote Quote  
  4. I'm a Super Moderator johns0's Avatar
    Join Date
    Jun 2002
    Location
    canada
    Search Comp PM
    You did say in your first post that you wanted to convert the idx/sub to srt,not sure if you want text based or a bunch of separate tif pictures or both.
    I think,therefore i am a hamster.
    Quote Quote  
  5. Member
    Join Date
    Dec 2008
    Location
    Switzerland
    Search Comp PM
    Originally Posted by johns0 View Post
    You did say in your first post that you wanted to convert the idx/sub to srt,not sure if you want text based or a bunch of separate tif pictures or both.
    That is my ultimate goal, to convert idx/sub into srt.

    The issue now is that current programs like SubRip doesn't support Non-Latin Languages like (Arabic, Farsi, Hebrew).

    My Goal is to convert idx/sub files from those languages (Arabic, Farsi, Hebrew) into srt.

    To achieve this I need first to convert idx/sub files into .tif file (containing all BMP or PNG images from the subtitle).

    I'll take this file and dump into a Professional OCR program that support those languages.

    Now what I need is the following:

    1- A program that will convert the idx/sub file into one .tif file. I don't want separate images.

    2- I need a way to manipulate the idx/sub images to make the Subtitle Font Color in Black & the Subtitle Background Color in White.
    Quote Quote  
  6. Member Cornucopia's Avatar
    Join Date
    Oct 2001
    Location
    Deep in the Heart of Texas
    Search PM
    I do not think you will find one that does that. You are are crossing boundaries between the world of the VIDEO mindset and the world of the DESKTOP PUBLISHING mindset, and there aren't many people, programmers or not, with full facile understanding of both (which is what would be required to get this job done).
    Each time there is a change on the screen, there is a new "picture". In Vid land, these pictures are singular as regular photos or concatenated in time as video. There is no concept of time in DTP land, so pictures must be concatenated in space (bigger, Poster size) or layers (aka "pages"). What you are wanting is layers. So take baby steps, using multiple tools.

    The app you already used has outputted *.PNG files, so take those and use a batch photo converter to single layer ("flattened") .TIF files.
    Then, using Photoshop, Acrobat, or similar, open and combine them all into a single "Multi-page" or multi-layer .TIF file.

    THEN, you can apply the OCR you are attempting.

    AFAIK, no other way around all those (3) steps.

    Scott
    Quote Quote  
  7. First - did you try looking on various subtitle sites? It might save you the trouble and potential OCR errors



    How would you want them arranged in the "contact sheet"? Vertically (1 x n grid), or horizontally (n x 1 grid) ?

    You can batch process most of the steps Scott mentioned with ffmpeg. Since each png will have different dimensions, you need to pad each png so they are the same dimensions (not resize, but "pad" as in adding black borders), otherwise it won't fit on the contact sheet properly

    First, you can customize the pallete color with bdsup2sub, or you can use ffmpeg to invert the colors if the default pallete is white text. You can also export the png sequence as you did before with bdsup2sub

    In this example , I decided on padding to a final dimension of 720x240. If your subs are multiple lines, long entries etc... you might need to pad to a larger height than 240 pixels . The "%0d" notation is the sprintf wildcard syntax, the number of digits with placeholders. So if your sequence was img00001.png, img00002.png, you would need "img%05d.png"

    The first command will generate the same image sequence, but padded to 720x240. Each entry will be at the left and top (0,0) .

    Code:
    ffmpeg -i "input%04d.png" -vf pad=720:240:0:0 "pad%04d.png"
    But if you didn't fix the pallette, and the text is white, you can invert it with ffmpeg
    Code:
    ffmpeg -i "input%04d.png" -vf pad=720:240:0:0,curves=preset=color_negative "pad%04d.png"

    The 2nd command will join the padded image sequence to a (1 x n) "grid" contact sheet, where "n" is the number of entries. In this example, I used "1000" . If you had 3456 entries, change the number to 3456

    Code:
    ffmpeg -i "pad%04d.png" -vf tile=1x1000 tile.tif




    If you wanted a multilayered PSD or TIF, you can still use the padded image sequence and use that as input into photoshop (I think gimp can do it too)
    Last edited by poisondeathray; 26th Apr 2014 at 11:34.
    Quote Quote  
  8. I'm a Super Moderator johns0's Avatar
    Join Date
    Jun 2002
    Location
    canada
    Search Comp PM
    Just load the idx/sub into subtitle edit and choose start ocr,if you clicked on ok then you skipped a step,also you can download the desired language dictionary if subtitle edit doesn't have it.
    I think,therefore i am a hamster.
    Quote Quote  
Visit our sponsor! Try DVDFab and backup Blu-rays!