OCR software

6th Nov 2018 14:29 #1
Aludin

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2016
I need to convert some graphic text screenshots back into actual text. I originally wanted one capable of batch-processing but instead I stacked a bunch of screenshots together into one image to avoid that. They are screenshots of video metadata because some genius decided to archive video info of a whole series into JPEGs... Yeah...
There must be some lunatic out there who lost all these episodes in a crash so I'll archive a bunch of useful info on my site and make it unsearchable JPEG to make his job that much harder. Seriously...

Anyway, I only tried FreeOCR so far and it... sucks. 'Nuff said. Any other recommendations?

Quote
6th Nov 2018 14:47 #2
pandy

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2008
Did you try to learn/train tesseract (freeocr use tesseract library).

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-charact...s-recognition/

Without tesseract you are probably forced to use commercial software and seem Abbyy Finereader is frequently mentioned as best.

Quote
6th Nov 2018 16:42 #3
blud7

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2011
Gray-scale or black and white text is easier for an OCR.
Abbyy Finereader and Nuance Omnipage are the best commercial.
As for freeware you can try online OCRs which I believe use tesseract, they aren't bad.

Quote
7th Nov 2018 11:18 #4
Aludin

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2016
I got stuck on this part on that tutorial.

Code:

N=662 # set accordingly to the number of files that you have for i in `seq 0 $N`; do tesseract $i.bmp $i batch.nochop makebox done

gives me

Code:

bash: seq: command not found

I'm a little confused because the tutorial never instructs where to point the cygwin program at so it can find the image files. Like, how would it know where to look without specific input from me? The article never mentions this. It just tells me to name them a certain way and run that script in cygwin.
Quote
8th Nov 2018 04:29 #5
pandy

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2008
Originally Posted by Aludin

I got stuck on this part on that tutorial.

snip

I'm a little confused because the tutorial never instructs where to point the cygwin program at so it can find the image files. Like, how would it know where to look without specific input from me? The article never mentions this. It just tells me to name them a certain way and run that script in cygwin.

Can't help and i'm affraid that you may not find help on this forum either - on your side i would search for help on tesseract forum - perhaps only one way to you will be train tesseract in Linux environment (i assume you trying to do this in Windows environment).

Quote

OCR software

Thread Tools

Search Thread

Similar Threads

OCR a video with Google's OCR

software for ocr

OCR subs from mp4

I can't use OCR operation...

SubRip help - OCR preview