VideoHelp Forum
+ Reply to Thread
Results 1 to 10 of 10
Thread
  1. I was wondering if VOSK or Whisper had any major updates for use with Subtitle Edit.
    A check of AlphaCephai shows the vosk-model-en-us-0.22-lgraph model still on the list.
    As to Whisper I don't know what progress there might be in that at all.

    VOSK mentioned a major update for 2024 last year.
    Quote Quote  
  2. Originally Posted by loninappleton View Post
    I was wondering if VOSK or Whisper had any major updates for use with Subtitle Edit.
    A check of AlphaCephai shows the vosk-model-en-us-0.22-lgraph model still on the list.
    As to Whisper I don't know what progress there might be in that at all.

    VOSK mentioned a major update for 2024 last year.
    There were a couple updates yesterday that kinda broke some things on my end it seemed like.
    Doing long subtitles now about 8 seconds causes it to freak out and jump back up to the top.
    I got a perfect transcription twice with the one 8 second bit split into two.
    Then I adjusted the timing to match the scene dialog, ran it again, and it was something completely different...
    Sometimes it great, but other times it just doesn't work the way you want it to,
    and I guess that's just the nature of these AI and you gotta take it or leave it..
    Quote Quote  
  3. Thanks for answering. Sounds like neither has had any major updates.
    Quote Quote  
  4. Hi!

    There are a plethora of options in Subtitle Edit. Which audio to text method/option are the fastest? De default Faster-Whisper seems ironically obnoxiously slow.
    Quote Quote  
  5. @mzso

    Thanks for the response. I checked Vosk a short time ago and expected their announced significant upgrade to appear but the old model numbers are still at the site. And yes I do everything through Subtitle Edit not knowing anything about command line.

    Just now I'm not actrive in subtitling. The few I had real _mission_ to do for my own use I've completed and in a shareable source for others. But that's another story. The projects are all plays, some had the AI treatment or transcribed subs and needed a lot of editing, others I ran the tools at Subtitle Edit, then finished with printed published scripts available and added HI (hearing impaired) descriptions.

    One thing that stopped me on a project was a documentary of opera star Maria Callas. Whisper would hang when it encountered musical portions and became useless. It may have been my low level PC computer or the source DVD audio. I haven't gotten passed those sorts of problems.
    Quote Quote  
  6. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by mzso View Post
    Hi!

    There are a plethora of options in Subtitle Edit. Which audio to text method/option are the fastest? De default Faster-Whisper seems ironically obnoxiously slow.
    I have been using Whisper AI (Command Line) ever since I read about it's existence. SubtitleEdit has several options but I prefer the command line method because it has more flexibility.
    I found the best parameter to add to a run is the
    --condition_on_previous_text False (Default is True)
    very helpful to stop it repeating the same sentence on and on.
    There are several expensive software applications that incorporate speech-to-text. Vegas and Da Vinci Studio. I would love to give them a try but I am not going to pay so much money without having the chance to evaluate first.
    So in the meantime I am sticking with Whisper AI
    https://github.com/openai/whisper
    Quote Quote  
  7. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    @Subtitles
    +1
    But, I use this:
    https://github.com/Purfview/whisper-standalone-win/releases/tag/Faster-Whisper-XXL
    Everything needed, except the language models, is in the one package.

    You can copy your models over, or nominate the _models folder with
    Code:
    --model_dir "D:\Whisper-XXL\_models" --model large-v2
    For me, the language models are here:
    Code:
     Directory of D:\Whisper-XXL\_models\faster-whisper-medium
    03/10/2024  01:13 PM    <DIR>          .
    03/10/2024  01:13 PM    <DIR>          ..
    03/10/2024  12:59 PM             2,574 config.json
    03/10/2024  01:13 PM       788,826,555 model.bin
    03/10/2024  12:59 PM               339 preprocessor_config.json
    03/10/2024  12:59 PM         2,405,678 tokenizer.json
    03/10/2024  12:59 PM           825,480 vocabulary.json
                   6 File(s)    792,060,626 bytes
    
     Directory of D:\Whisper-XXL\_models\faster-whisper-large-v2
    03/10/2024  01:10 PM    <DIR>          .
    03/10/2024  01:10 PM    <DIR>          ..
    17/08/2023  10:01 AM             2,796 config.json
    17/08/2023  10:57 AM     3,086,912,962 model.bin
    17/08/2023  10:01 AM         2,203,239 tokenizer.json
    17/08/2023  10:01 AM           459,861 vocabulary.txt
                   5 File(s)  3,089,578,858 bytes
    Last edited by pcspeak; 28th Oct 2024 at 11:05. Reason: Clarity
    Quote Quote  
  8. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    @pcspeak
    Thanks for sharing.
    How accurate is this version?
    With Whisper AI I still have the problem of syncing the first few sentences of text to the audio. I just manually deal with it is not a big issue.
    Quote Quote  
  9. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    Subtitles
    Thanks for sharing.
    How accurate is this version?
    Can't say. I haven't used your version for 8 months. With them both using the same language models
    I'm guessing about the same. For me it's the speed.

    Some of the following parameters are the defaults. I leave them in, and tweak them when looking for better results.
    medium_float32bb3.cmd
    Code:
    @echo off
    for %%a in ("d:\aa\*.mkv", "d:\aa\*.mp4", "d:\aa\*.avi", "d:\aa\*.mpg") do  (
        "D:\Whisper-XXL\faster-whisper-xxl.exe" "%%a" ^
        --compute_type=int8_float32 ^
        --model_dir "D:\Whisper-XXL\_models" --model medium.en ^
        --vad_filter true --task transcribe --sentence --language en --beam_size 3 --best_of 3 --verbose true -o "%%~dpa\"
    )
     echo All done. Press any key to Exit. &pause>nul
    I find --vad_filter false helps for noisy videos. e.g. motor racing.
    Mostly I switch to the large v2 model and try --beam_size 8 --best_of 8 if I'm not happy. I don't use large v3 or distil models at all.


    @loninappleton. Sorry, I seem to have hijacked your thread.
    I suggest you nut your way through with learning to use the software from the link I posted above.
    I'm happy to provide a few single line batch files for you to test.
    Last edited by pcspeak; 28th Oct 2024 at 14:57.
    Quote Quote  
  10. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    You can use Spleeter to isolate the vocals from other sounds.
    https://research.deezer.com/projects/spleeter.html
    Several Windows application use Spleeter (see list in above link). Give Acon Digital (Acoustica) a try. 30 days free trial fully functional or install it from the GitHub.
    Vocals Isolation might help, if you suspect the noise is interfering.
    There are other apps but not free.
    Also a lot of YouTube videos available on this subject.

    @loninappleton sorry I got carried away.
    I will stop here.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!