VideoHelp Forum

+ Reply to Thread
Results 1 to 15 of 15
Thread
  1. Over the past weeks I was trying out VideoSubFinder.

    I got some very good results whenever the video was good quality. Didn't have to be HD. Even some old VHS tapes from 1985 gave reasonable results. (Not perfect but needing relatively little correction.)

    However I didn't get very far at all with yellow and/or hebrew subtitles. I custom set the color during the process but that didn't seem to change anything.


    Are there any tips and tricks that I missed? Alternatives?


    Yellow, typical ARTE subtitles: https://www.youtube.com/watch?v=NX_mR4_J_wo
    And the full video is over three hours and 2000 items so it really would save a lot of effort.
    Of course, it's not crystal clear but I have a slightly better copy, I wasn't working straight from youtube.

    Another case was a video I had in 480p which looked sharp, but here the letters were yellow and Hebrew: https://www.youtube.com/watch?v=AlVxUplK9DU
    Of course 480 is not very large but it was sharp picture quality.

    So maybe the yellow is causing lower results and more OCR gibberish?


    Grateful for all advice. Optional subs would allow translation and then I (and others) could watch these things.
    This is nøt å signåture.™
    Quote Quote  
  2. Member
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    I don't get good results with it too, that's why I wrote InpaintDelogo to get perfect or close to perfect results.
    Btw, atm it's not finetuned for hardsubs on SD sources as I never needed to OCR one, but should output an workable result.
    Quote Quote  
  3. Originally Posted by VoodooFX View Post
    I don't get good results with it too, that's why I wrote InpaintDelogo to get perfect or close to perfect results.
    Btw, atm it's not finetuned for hardsubs on SD sources as I never needed to OCR one, but should output an workable result.
    It would be good if it worked. Hardsubs are after all often found on older sources.
    And would that work on Hebrew alphabet?

    Looks like I've got a learning curve ahead to figure out how to work scripts...!

    The delogo examples there seem superior! But - don't take this the wrong way - VideoSubFinder already does a good job. I don't suppose I could get you to just try out one video for me?
    Of course, your time is limited. But maybe you're curious too if you can outdo the competition?
    This is nøt å signåture.™
    Quote Quote  
  4. Member
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Originally Posted by Spiny Norman View Post
    And would that work on Hebrew alphabet?
    Looks like I've got a learning curve ahead to figure out how to work scripts...!
    It doesn't care about alphabets.
    You need to grasp about AviSynth+ at least. Actual workflow is not complicated.

    Originally Posted by Spiny Norman View Post
    But maybe you're curious too if you can outdo the competition?
    It's already outdone if you meant VideoSubFinder, maybe there are cases where VSB is good, I just don't know as I don't use it.

    By saying "good", how many false positives, wrong splits and artifacts you are getting? Every time I tried I was getting from hundred to thousands. Faster I would write down those subs by hand than sort all that mess. When with InpaintDelogo I get 0 to few errors.

    Recently, for one user's request I done comparison on the very bad quality hardsubs [just stats, I didn't compared anything else, too messy]:
    InpaintDelogo: 0h 16m = 1058 images
    VideoSubFinder: 1h 30m = 2853 images

    Originally Posted by Spiny Norman View Post
    I don't suppose I could get you to just try out one video for me?
    Just to try I can, but I won't look if there are any issues, and there will be, as it's atm tuned only to 720p and above sources.
    If you promise to do ~full comparison.
    Quote Quote  
  5. Well, VSF gets most of the timings right even with just SD -- text quality is strongly depending on the video quality but at least you're correcting rather than typing & timing.

    Thank you, I'm sending you a link in a few minutes, much appreciated. And worst that can happen is it doesn't register much.
    This is nøt å signåture.™
    Quote Quote  
  6. Member
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Here you go: https://mega.nz/file/baowVSpa#q7aV1j_AIqv2XLc-uzsnPkTpbwYQ4PE0s9lHFzG5cpg

    That video is nowhere near to HD like you wrote in PM, it's simple SD. Let me know how it compares.
    Quote Quote  
  7. Originally Posted by VoodooFX View Post
    Here you go: https://mega.nz/file/baowVSpa#q7aV1j_AIqv2XLc-uzsnPkTpbwYQ4PE0s9lHFzG5cpg

    That video is nowhere near to HD like you wrote in PM, it's simple SD. Let me know how it compares.
    Thank you, I ran these through ABBYY for OCR and the results are vastly superior:

    Originally Posted by VideoSubFinder
    reconstituer
    ' smàüc
    ^,.0
    But now it becomes
    "et tâchons de reconstituer l'ensemble du corps."

    Even if you don't know French, it seems pretty clear who won.
    Extra points for the ^ accent!

    I will find a way to convert these 2,000 txt files into one srt and then it's perfect.
    Or nearly perfect. I wasn't able to evaluate the timings yet. (I saw a few empty entries but I prefer that to missing out on lines, and, they can be quickly deleted.)


    Looks like I'll have to leave the comfort of the GUI for this!
    Image Attached Thumbnails Click image for larger version

Name:	vlcsnap-2021-11-28-11h56m31s775.png
Views:	8
Size:	294.3 KB
ID:	62046  

    This is nøt å signåture.™
    Quote Quote  
  8. Member
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Originally Posted by Spiny Norman View Post
    I will find a way to convert these 2,000 txt files into one srt and then it's perfect.
    Or nearly perfect. I wasn't able to evaluate the timings yet. (I saw a few empty entries but I prefer that to missing out on lines, and, they can be quickly deleted.)
    By my book I would call them "not bad", for my "perfect" there should be ~0 errors.

    It's simple task with Subtitle Edit, "File>Import>Plain Text...". Select there "Multiple files" and "Generate time codes".
    Same way images can be imported too: "File>Import>Images".

    Auto-OCR as FineReader or Tesseract are not good for me. I use precise binary comparison methods.
    On images like these, nOCR method is good:



    PS:
    Btw, I don't see you posting at your usual place, did they wronged you in some way?
    Last edited by VoodooFX; 28th Nov 2021 at 10:22.
    Quote Quote  
  9. Oh right, I see, Subtitle Edit picks up on the timings in the file names. Neat!

    Well, it means we also get to compare subtitle edit with abbyy for their OCR results.


    Sorry, I didn't recognise you, I thought it had been copied there by an attentive user!
    (Yes, they disappointed me severely. So I'm not contributing there as much as I might. Except maybe these results.)
    This is nøt å signåture.™
    Quote Quote  
  10. PS Yes, tesseract beats abbyy finereader. However, nOCR got lost in too much doubt (or too less certainty in the images).

    Huge advances made where this program is concerned. (Over 3 hours, not desirable to just type by hand...)
    This is nøt å signåture.™
    Quote Quote  
  11. Member
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Probably nOCR needs cleaner images than these.
    Quote Quote  
  12. Originally Posted by VoodooFX View Post
    Probably nOCR needs cleaner images than these.
    Definitely - during processing I saw the jagged edges enlarged, and the source was not too good...

    The rest I need to rip is smaller, but sharper, so we'll see.
    This is nøt å signåture.™
    Quote Quote  
  13. Member
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Originally Posted by Spiny Norman View Post
    The rest I need to rip is smaller, but sharper, so we'll see.
    nOCR is for big letters/images, for smaller like DVD subs use "Binary image compare".
    Quote Quote  
  14. Thanks again!
    This is nøt å signåture.™
    Quote Quote  



Similar Threads