I'm trying to convert idx/sub file subtitles to .srt file.
I know that the easiest way is to use SubRip, but the issue is that I'm trying to convert idx/sub files for the languages (Arabic, Farsi, Hebrew).
Those languages are not supported in SubRip, therefore, I'm trying to convert the subs from idx/sub file into .tif file which can be converted into text using one of the OCR programs like IRIS.
I've tried to use SubtitleCreator, but the quality of the resulted .tif file wasn't encouraging.
Any other suggestions?
+ Reply to Thread
Results 1 to 8 of 8
Please correct me if I'm wrong.
I tried to open the sub file in Subtitle Edit and then I chose (Yes) when the dialogue box "Import this VobSub subtitle?" popped up.
I right clicked on the subtitles and chose "Save all images with html index".
Now there are two issues:
1- I need all the subs to be in Black Color with White Background (I know this can be done by manipulating "Image Palette", but I woul really appreciate if someone could guide me how to do it exactly).
2- The result from this process is one index.html file with several .png files.
Is that what I should get and how I can merge all those .png files into one .tif file.
Appreciate your feedback & suggestions.
You did say in your first post that you wanted to convert the idx/sub to srt,not sure if you want text based or a bunch of separate tif pictures or both.I think,therefore i am a hamster.
The issue now is that current programs like SubRip doesn't support Non-Latin Languages like (Arabic, Farsi, Hebrew).
My Goal is to convert idx/sub files from those languages (Arabic, Farsi, Hebrew) into srt.
To achieve this I need first to convert idx/sub files into .tif file (containing all BMP or PNG images from the subtitle).
I'll take this file and dump into a Professional OCR program that support those languages.
Now what I need is the following:
1- A program that will convert the idx/sub file into one .tif file. I don't want separate images.
2- I need a way to manipulate the idx/sub images to make the Subtitle Font Color in Black & the Subtitle Background Color in White.
I do not think you will find one that does that. You are are crossing boundaries between the world of the VIDEO mindset and the world of the DESKTOP PUBLISHING mindset, and there aren't many people, programmers or not, with full facile understanding of both (which is what would be required to get this job done).
Each time there is a change on the screen, there is a new "picture". In Vid land, these pictures are singular as regular photos or concatenated in time as video. There is no concept of time in DTP land, so pictures must be concatenated in space (bigger, Poster size) or layers (aka "pages"). What you are wanting is layers. So take baby steps, using multiple tools.
The app you already used has outputted *.PNG files, so take those and use a batch photo converter to single layer ("flattened") .TIF files.
Then, using Photoshop, Acrobat, or similar, open and combine them all into a single "Multi-page" or multi-layer .TIF file.
THEN, you can apply the OCR you are attempting.
AFAIK, no other way around all those (3) steps.
First - did you try looking on various subtitle sites? It might save you the trouble and potential OCR errors
How would you want them arranged in the "contact sheet"? Vertically (1 x n grid), or horizontally (n x 1 grid) ?
You can batch process most of the steps Scott mentioned with ffmpeg. Since each png will have different dimensions, you need to pad each png so they are the same dimensions (not resize, but "pad" as in adding black borders), otherwise it won't fit on the contact sheet properly
First, you can customize the pallete color with bdsup2sub, or you can use ffmpeg to invert the colors if the default pallete is white text. You can also export the png sequence as you did before with bdsup2sub
In this example , I decided on padding to a final dimension of 720x240. If your subs are multiple lines, long entries etc... you might need to pad to a larger height than 240 pixels . The "%0d" notation is the sprintf wildcard syntax, the number of digits with placeholders. So if your sequence was img00001.png, img00002.png, you would need "img%05d.png"
The first command will generate the same image sequence, but padded to 720x240. Each entry will be at the left and top (0,0) .
ffmpeg -i "input%04d.png" -vf pad=720:240:0:0 "pad%04d.png"
ffmpeg -i "input%04d.png" -vf pad=720:240:0:0,curves=preset=color_negative "pad%04d.png"
The 2nd command will join the padded image sequence to a (1 x n) "grid" contact sheet, where "n" is the number of entries. In this example, I used "1000" . If you had 3456 entries, change the number to 3456
ffmpeg -i "pad%04d.png" -vf tile=1x1000 tile.tif
If you wanted a multilayered PSD or TIF, you can still use the padded image sequence and use that as input into photoshop (I think gimp can do it too)
Last edited by poisondeathray; 26th Apr 2014 at 10:34.