VideoHelp Forum
+ Reply to Thread
Results 1 to 5 of 5
Thread
  1. I need to convert some graphic text screenshots back into actual text. I originally wanted one capable of batch-processing but instead I stacked a bunch of screenshots together into one image to avoid that. They are screenshots of video metadata because some genius decided to archive video info of a whole series into JPEGs... Yeah...
    There must be some lunatic out there who lost all these episodes in a crash so I'll archive a bunch of useful info on my site and make it unsearchable JPEG to make his job that much harder. Seriously...

    Anyway, I only tried FreeOCR so far and it... sucks. 'Nuff said. Any other recommendations?
    Quote Quote  
  2. Did you try to learn/train tesseract (freeocr use tesseract library).

    https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
    http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-charact...s-recognition/

    Without tesseract you are probably forced to use commercial software and seem Abbyy Finereader is frequently mentioned as best.
    Quote Quote  
  3. Gray-scale or black and white text is easier for an OCR.
    Abbyy Finereader and Nuance Omnipage are the best commercial.
    As for freeware you can try online OCRs which I believe use tesseract, they aren't bad.
    Quote Quote  
  4. I got stuck on this part on that tutorial.

    Code:
    N=662 # set accordingly to the number of files that you have
    for i in `seq 0 $N`; do
        tesseract $i.bmp $i batch.nochop makebox
    done
    gives me

    Code:
    bash: seq: command not found
    I'm a little confused because the tutorial never instructs where to point the cygwin program at so it can find the image files. Like, how would it know where to look without specific input from me? The article never mentions this. It just tells me to name them a certain way and run that script in cygwin.
    Quote Quote  
  5. Originally Posted by Aludin View Post
    I got stuck on this part on that tutorial.

    snip

    I'm a little confused because the tutorial never instructs where to point the cygwin program at so it can find the image files. Like, how would it know where to look without specific input from me? The article never mentions this. It just tells me to name them a certain way and run that script in cygwin.
    Can't help and i'm affraid that you may not find help on this forum either - on your side i would search for help on tesseract forum - perhaps only one way to you will be train tesseract in Linux environment (i assume you trying to do this in Windows environment).
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!