Hi,
It’s a while now that i’m wondering how could I OCR a full video.
I know that there are programs like SubRip and AviSubDetector, but they aren’t the best, they struggle with complicated fonts, they need a lot of manual inputs and they are a bit clucky in general for what I need them to do (OCR hardsubbed cartoons/anime episodes with complicated fonts), they aren’t practical enough.

So I’m thinking: is there a way to use google keep’s ocr to do this? It’s the best ocr i know at the moment and it recognised everything i threw at it, but it only accepts images that aren’t larger than 4Mb.

I just discovered AutoIt, so i’m wandering, is there a way to automate the process of: converting a video in a jpg sequence, crop every image, upload the first on Google keep, do the ocr, copy the result, paste it somewhere, delete the image and upload the next one?

Obviously I didn’t take in consideration the whole “timing” question, so i’m creating a text file with only the actual dialogs, without time stamps.
But maybe it can be done, utilizing AviSubDetector: when it stops because it recognised a subtitle, i grab the timing from it with the autoit script and i paste it on my text file.

These are just ideas, but i’d like to know what you guys think.
Is there a better way to ocr an entire video with a google ocr engine? (not tesseract, as far as i know it’s not even near google keep performance-wise).
Or maybe a better one, if it exist.
Let me know