Distinguish Images with Text

Thread

12th Apr 2024 04:56 #1
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
I have been looking for some time for a way to distinguish and mark thousands of images that have text from images that don't.
The object is to delete all the images that don't have text and keep only those that have.
Doing it manually takes a very long time and wears out my fingers from using a mouse and keyboard.
These days when you hear about AI all the time, I am just wondering if there is an automatic method to do this job accurately and fast.
Thanks for any suggestions weather it is AI related or an existing application that I have missed.

Quote
12th Apr 2024 11:20 #2
TreeTops

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2010

Location
Oregon
AI might do that someday. But for now I suspect you have to do it manually.

Extraordinary claims require extraordinary evidence -Carl Sagan

Quote
12th Apr 2024 14:33 #3
zing269

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2009

Location
United States
You might be able to do something with one of these projects.

https://www.geeksforgeeks.org/python-ocr-on-all-the-images-present-in-a-folder-simultaneously/

https://github.com/Aneapiy/OCR-Image-Sort

Even if you don't need the text extracted you can use that info to select the images.

Quote
12th Apr 2024 16:52 #4
jagabo

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2005
Is there anything consistent about the text? Is it always in the same location on the frame? Always the same color or brightness? For example, a video might have bright yellow subtitles that appear near the bottom of the screen. One can detect which frames have subtitles by looking for bright yellow in that area. You're not detecting text per se, but bright yellow stuff. You'll sometimes get a false positive (some other yellow object is there) but you can at least pre-screen the bulk of the images.

Quote
13th Apr 2024 05:21 #5
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by zing269

You might be able to do something with one of these projects.

https://www.geeksforgeeks.org/python-ocr-on-all-the-images-present-in-a-folder-simultaneously/

https://github.com/Aneapiy/OCR-Image-Sort

Even if you don't need the text extracted you can use that info to select the images.

Thanks for trying to help.
Both projects work by using OCR to extract the text if available from ALL the images. I am trying to avoid that by first deleting all the images that don't have text.

Quote
13th Apr 2024 05:31 #6
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by jagabo

Is there anything consistent about the text? Is it always in the same location on the frame? Always the same color or brightness? For example, a video might have bright yellow subtitles that appear near the bottom of the screen. One can detect which frames have subtitles by looking for bright yellow in that area. You're not detecting text per se, but bright yellow stuff. You'll sometimes get a false positive (some other yellow object is there) but you can at least pre-screen the bulk of the images.

Interesting idea to look for a certain colour. How can I do this automatically?

Quote
13th Apr 2024 08:15 #7
jagabo

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2005
AviSynth and a batch script.

Quote

Distinguish Images with Text

Thread Tools

Similar Threads

How to distinguish correct license urls

Rename multiple images within folders using text files

Batch method to find images that contains only text !

overlay some text with a black background/white text over one small section

How do we remove video parts shown as text images