I'm trying to get the text in the subtitles of an avi movie that also includes files with extensions IDX & SUB. The movie contains french text subtitles that I'm trying to extract. However when I use the "subtitle edit" application, I can only get it to use an english dictionary. That's not what I need. I want it to recognize french words. I even downloaded the french dictionary by using that same application, but that dictionary is not an option when I try to do OCR on those files.
How can I get this to work? Thanks a lot for your help with this issue.
+ Reply to Thread
Results 1 to 2 of 2
You need both the open office spell check dictionary + the Tesseract dictionary (if you're going to use ocr via Tesseract).
The French Tesseract dictionary is available here: http://code.google.com/p/tesseract-ocr/downloads/detail?name=fra.traineddata.gz&can=2&q=
Unpack it to the "Tesseract\tessdata" sub-folder