How to Use ABBYY FineReader for OCR Hardcoded Subtitles From Videos?

Thread

30th Mar 2018 23:08 #1
devilcoelhodog

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2006

Location
Brazil
Hello, dear all.

As the title says: how to use ABBYY FineReader ( and similar professional / non-freeware programs ) for OCR hardcoded subtitles from videos? Is that possible?

Here we have some tools that can do it also: AVISubDetector, esrXP, VideoSubFinder and even SubRip ( Subrip 1.4+ can also rip subs from avi with burned in/hardcoded/permanent subtitles ).

---> https://forum.videohelp.com/threads/369591-Ripping-subs-with-OCR-from-DVD-image

---> https://forum.videohelp.com/threads/362255-IdxSubOcr-OCR-on-chinese-traditional-character-program

---> https://forum.videohelp.com/threads/331113-How-to-extract-subtitle-from-videofile-AVI

---> https://www.chinese-forums.com/forums/topic/44954-extracting-chinese-hardsubs-from-a-video/

---> https://forum.doom9.org/showthread.php?t=162721&page=7

---> http://zuggy.wz.cz/guides/video.htm

If is that possible, do you know some tutorial / guide / video that teach how to do it properly?

I assume that programs like ABBYY FineReader can OCR images better than the previous programs that I said. Or not necessarily?

ABBYY FineReader is a good OCR program? Do you point some other good ones too nowadays?

I heard about OCR using Artificial Intelligence too, but I don't know if such thing is too expensive or available for the most people. Or if such thing is superior comparing with the other similar products on the market.

And ABBYY FineReader ( and similar programs ) can OCR all kind of "images" into "text" too, including languages like chinese, hebrew, japanese, arabic, russian, greek, etc?

Thanks for your tips.

Best regards.

devil (johner)

Last edited by devilcoelhodog; 30th Mar 2018 at 23:43.

Quote
31st Mar 2018 05:01 #2
pandy

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2008
ABBYY is Russian company so at least "Cyrillic" (formally this is гражданка not Cyrillic) characters are supported with same or better results as latin.

Yes, this one of the best if not the best OCR available on market.

how to use in CLI mode - perhaps this can help https://stackoverflow.com/questions/16385443/abbyy-finereader-exe-looking-for-cmd-comm...ther-programms (definitely this is possible as frequently ABBYY FR is used in such mode for ATE) .

Believe that after training you can correctly OCR any text with FR,

Quote
31st Mar 2018 05:41 #3
devilcoelhodog

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2006

Location
Brazil
@ pandy

Hi.

In a video with hardcoded subtitles how can make ABBYY OCR such image into txt?

See the example below:

---> https://www.youtube.com/watch?v=sUmIiWLoEuo

I guess that the timing for the speeches ABBYY can't get. But the text that is on the video can OCR it?

Well, in this video we have a very, very fast scrolling subtitles. And this can make harder to get the text too by OCR programs.

One more example too:

---> https://www.youtube.com/watch?v=RUpui_-_S-E

Thanks for your tips.

Best regards.

devil (johner)

Last edited by devilcoelhodog; 31st Mar 2018 at 05:48.

Quote
31st Mar 2018 07:25 #4
pandy

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2008
You can always add hard timestamp to video and use OCR to recognise both so with every text also timestamps will be associated.
However you can't ignore that OCR has particular requirements for input data to provide correct text - it may be difficult if not impossible to separate such moving text without human help.
If you are interested in lyrics for such videos then perhaps it will be easier to use some already existing lyrics database instead creating another one unless those examples are unrelated to your goal (but share same characteristic).

I see main problem with redundant data - you can export video to pictures, you may crop area with subtitles to reduce amount of data, you can improve contrast - usually convert to grayscale, increase contrast, perhaps some morphological operations on images and finally thresholding to create set of B/W pictures to feed OCR but... but at some point you will end with series of text data almost same (case fast horizontal scrolling lyrics) - you need at some point start to recognize redundancies and eliminate them. This redundancy may be reduced by for example decimation amount of pictures for example select 1 of 5 but still unless you find proper decimation factor then you will get lot of almost duplicated data.

Quote
29th Dec 2018 10:26 #5
AlvoErrado2

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2013

Location
Brazil
Originally Posted by devilcoelhodog

Hello, dear all.

As the title says: how to use ABBYY FineReader ( and similar professional / non-freeware programs ) for OCR hardcoded subtitles from videos? Is that possible?

Here we have some tools that can do it also: AVISubDetector, esrXP, VideoSubFinder and even SubRip ( Subrip 1.4+ can also rip subs from avi with burned in/hardcoded/permanent subtitles ).

Here are some tutorials showing how to use the VideoSubFinder + ABBYY FineReader, you will use both programs to rip the video subtitles.

https://www.youtube.com/watch?v=VHsUfqqAkWY
https://www.youtube.com/watch?v=uTbTARWeZGw
https://www.youtube.com/watch?v=3MDEr-Lb_Cs
http://proyectohardsubs.blogspot.com/2014/12/tutorial-hardsubs-videosubfinderabby.html
http://jumonjigiri.blogspot.com/p/extracting-hardsubs.html

Quote

How to Use ABBYY FineReader for OCR Hardcoded Subtitles From Videos?

Thread Tools

Similar Threads

Hardcoded Subtitles

Editing PGS subtitles without OCR

snapshots of hardcoded subtitles

Anime Subtitles to Hardcoded

How to hide hardcoded subtitles in a Macintosh