VideoHelp Forum
+ Reply to Thread
Results 1 to 7 of 7
Thread
  1. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    I have been looking for some time for a way to distinguish and mark thousands of images that have text from images that don't.
    The object is to delete all the images that don't have text and keep only those that have.
    Doing it manually takes a very long time and wears out my fingers from using a mouse and keyboard.
    These days when you hear about AI all the time, I am just wondering if there is an automatic method to do this job accurately and fast.
    Thanks for any suggestions weather it is AI related or an existing application that I have missed.
    Quote Quote  
  2. AI might do that someday. But for now I suspect you have to do it manually.
    Extraordinary claims require extraordinary evidence -Carl Sagan
    Quote Quote  
  3. Member
    Join Date
    Jul 2009
    Location
    United States
    Search Comp PM
    You might be able to do something with one of these projects.

    https://www.geeksforgeeks.org/python-ocr-on-all-the-images-present-in-a-folder-simultaneously/

    https://github.com/Aneapiy/OCR-Image-Sort

    Even if you don't need the text extracted you can use that info to select the images.
    Quote Quote  
  4. Is there anything consistent about the text? Is it always in the same location on the frame? Always the same color or brightness? For example, a video might have bright yellow subtitles that appear near the bottom of the screen. One can detect which frames have subtitles by looking for bright yellow in that area. You're not detecting text per se, but bright yellow stuff. You'll sometimes get a false positive (some other yellow object is there) but you can at least pre-screen the bulk of the images.
    Quote Quote  
  5. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by zing269 View Post
    You might be able to do something with one of these projects.

    https://www.geeksforgeeks.org/python-ocr-on-all-the-images-present-in-a-folder-simultaneously/

    https://github.com/Aneapiy/OCR-Image-Sort

    Even if you don't need the text extracted you can use that info to select the images.
    Thanks for trying to help.
    Both projects work by using OCR to extract the text if available from ALL the images. I am trying to avoid that by first deleting all the images that don't have text.
    Quote Quote  
  6. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by jagabo View Post
    Is there anything consistent about the text? Is it always in the same location on the frame? Always the same color or brightness? For example, a video might have bright yellow subtitles that appear near the bottom of the screen. One can detect which frames have subtitles by looking for bright yellow in that area. You're not detecting text per se, but bright yellow stuff. You'll sometimes get a false positive (some other yellow object is there) but you can at least pre-screen the bulk of the images.
    Interesting idea to look for a certain colour. How can I do this automatically?
    Quote Quote  
  7. AviSynth and a batch script.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!