Over the past weeks I was trying out VideoSubFinder.
I got some very good results whenever the video was good quality. Didn't have to be HD. Even some old VHS tapes from 1985 gave reasonable results. (Not perfect but needing relatively little correction.)
However I didn't get very far at all with yellow and/or hebrew subtitles. I custom set the color during the process but that didn't seem to change anything.
Are there any tips and tricks that I missed? Alternatives?
Yellow, typical ARTE subtitles: https://www.youtube.com/watch?v=NX_mR4_J_wo
And the full video is over three hours and 2000 items so it really would save a lot of effort.
Of course, it's not crystal clear but I have a slightly better copy, I wasn't working straight from youtube.
Another case was a video I had in 480p which looked sharp, but here the letters were yellow and Hebrew: https://www.youtube.com/watch?v=AlVxUplK9DU
Of course 480 is not very large but it was sharp picture quality.
So maybe the yellow is causing lower results and more OCR gibberish?
Grateful for all advice. Optional subs would allow translation and then I (and others) could watch these things.
+ Reply to Thread
Results 1 to 15 of 15
This is nøt å signåture.™
And would that work on Hebrew alphabet?
Looks like I've got a learning curve ahead to figure out how to work scripts...!
The delogo examples there seem superior! But - don't take this the wrong way - VideoSubFinder already does a good job. I don't suppose I could get you to just try out one video for me?
Of course, your time is limited. But maybe you're curious too if you can outdo the competition?This is nøt å signåture.™
You need to grasp about AviSynth+ at least. Actual workflow is not complicated.
VideoSubFinder, maybe there are cases where VSB is good, I just don't know as I don't use it.
By saying "good", how many false positives, wrong splits and artifacts you are getting? Every time I tried I was getting from hundred to thousands. Faster I would write down those subs by hand than sort all that mess. When with InpaintDelogo I get 0 to few errors.
Recently, for one user's request I done comparison on the very bad quality hardsubs [just stats, I didn't compared anything else, too messy]:
InpaintDelogo: 0h 16m = 1058 images
VideoSubFinder: 1h 30m = 2853 images
If you promise to do ~full comparison.
Well, VSF gets most of the timings right even with just SD -- text quality is strongly depending on the video quality but at least you're correcting rather than typing & timing.
Thank you, I'm sending you a link in a few minutes, much appreciated. And worst that can happen is it doesn't register much.This is nøt å signåture.™
Originally Posted by VideoSubFinder
"et tâchons de reconstituer l'ensemble du corps."
Even if you don't know French, it seems pretty clear who won.
Extra points for the ^ accent!
I will find a way to convert these 2,000 txt files into one srt and then it's perfect.
Or nearly perfect. I wasn't able to evaluate the timings yet. (I saw a few empty entries but I prefer that to missing out on lines, and, they can be quickly deleted.)
Looks like I'll have to leave the comfort of the GUI for this!This is nøt å signåture.™
It's simple task with Subtitle Edit, "File>Import>Plain Text...". Select there "Multiple files" and "Generate time codes".
Same way images can be imported too: "File>Import>Images".
Auto-OCR as FineReader or Tesseract are not good for me. I use precise binary comparison methods.
On images like these, nOCR method is good:
Btw, I don't see you posting at your usual place, did they wronged you in some way?
Last edited by VoodooFX; 28th Nov 2021 at 09:22.
Oh right, I see, Subtitle Edit picks up on the timings in the file names. Neat!
Well, it means we also get to compare subtitle edit with abbyy for their OCR results.
Sorry, I didn't recognise you, I thought it had been copied there by an attentive user!
(Yes, they disappointed me severely. So I'm not contributing there as much as I might. Except maybe these results.)This is nøt å signåture.™
PS Yes, tesseract beats abbyy finereader. However, nOCR got lost in too much doubt (or too less certainty in the images).
Huge advances made where this program is concerned. (Over 3 hours, not desirable to just type by hand...)This is nøt å signåture.™
Probably nOCR needs cleaner images than these.
This is nøt å signåture.™
Thanks again!This is nøt å signåture.™