I was wondering if VOSK or Whisper had any major updates for use with Subtitle Edit.
A check of AlphaCephai shows the vosk-model-en-us-0.22-lgraph model still on the list.
As to Whisper I don't know what progress there might be in that at all.
VOSK mentioned a major update for 2024 last year.
Try StreamFab Downloader and download from Netflix, Amazon, Youtube! Or Try DVDFab and copy Blu-rays!
+ Reply to Thread
Results 1 to 10 of 10
Thread
-
-
There were a couple updates yesterday that kinda broke some things on my end it seemed like.
Doing long subtitles now about 8 seconds causes it to freak out and jump back up to the top.
I got a perfect transcription twice with the one 8 second bit split into two.
Then I adjusted the timing to match the scene dialog, ran it again, and it was something completely different...
Sometimes it great, but other times it just doesn't work the way you want it to,
and I guess that's just the nature of these AI and you gotta take it or leave it.. -
Hi!
There are a plethora of options in Subtitle Edit. Which audio to text method/option are the fastest? De default Faster-Whisper seems ironically obnoxiously slow. -
@mzso
Thanks for the response. I checked Vosk a short time ago and expected their announced significant upgrade to appear but the old model numbers are still at the site. And yes I do everything through Subtitle Edit not knowing anything about command line.
Just now I'm not actrive in subtitling. The few I had real _mission_ to do for my own use I've completed and in a shareable source for others. But that's another story. The projects are all plays, some had the AI treatment or transcribed subs and needed a lot of editing, others I ran the tools at Subtitle Edit, then finished with printed published scripts available and added HI (hearing impaired) descriptions.
One thing that stopped me on a project was a documentary of opera star Maria Callas. Whisper would hang when it encountered musical portions and became useless. It may have been my low level PC computer or the source DVD audio. I haven't gotten passed those sorts of problems. -
I have been using Whisper AI (Command Line) ever since I read about it's existence. SubtitleEdit has several options but I prefer the command line method because it has more flexibility.
I found the best parameter to add to a run is the
--condition_on_previous_text False (Default is True)
very helpful to stop it repeating the same sentence on and on.
There are several expensive software applications that incorporate speech-to-text. Vegas and Da Vinci Studio. I would love to give them a try but I am not going to pay so much money without having the chance to evaluate first.
So in the meantime I am sticking with Whisper AI
https://github.com/openai/whisper -
@Subtitles
+1
But, I use this:
https://github.com/Purfview/whisper-standalone-win/releases/tag/Faster-Whisper-XXL
Everything needed, except the language models, is in the one package.
You can copy your models over, or nominate the _models folder withCode:--model_dir "D:\Whisper-XXL\_models" --model large-v2
Code:Directory of D:\Whisper-XXL\_models\faster-whisper-medium 03/10/2024 01:13 PM <DIR> . 03/10/2024 01:13 PM <DIR> .. 03/10/2024 12:59 PM 2,574 config.json 03/10/2024 01:13 PM 788,826,555 model.bin 03/10/2024 12:59 PM 339 preprocessor_config.json 03/10/2024 12:59 PM 2,405,678 tokenizer.json 03/10/2024 12:59 PM 825,480 vocabulary.json 6 File(s) 792,060,626 bytes Directory of D:\Whisper-XXL\_models\faster-whisper-large-v2 03/10/2024 01:10 PM <DIR> . 03/10/2024 01:10 PM <DIR> .. 17/08/2023 10:01 AM 2,796 config.json 17/08/2023 10:57 AM 3,086,912,962 model.bin 17/08/2023 10:01 AM 2,203,239 tokenizer.json 17/08/2023 10:01 AM 459,861 vocabulary.txt 5 File(s) 3,089,578,858 bytes
Last edited by pcspeak; 28th Oct 2024 at 11:05. Reason: Clarity
-
@pcspeak
Thanks for sharing.
How accurate is this version?
With Whisper AI I still have the problem of syncing the first few sentences of text to the audio. I just manually deal with it is not a big issue. -
Subtitles
Thanks for sharing.
How accurate is this version?
I'm guessing about the same. For me it's the speed.
Some of the following parameters are the defaults. I leave them in, and tweak them when looking for better results.
medium_float32bb3.cmdCode:@echo off for %%a in ("d:\aa\*.mkv", "d:\aa\*.mp4", "d:\aa\*.avi", "d:\aa\*.mpg") do ( "D:\Whisper-XXL\faster-whisper-xxl.exe" "%%a" ^ --compute_type=int8_float32 ^ --model_dir "D:\Whisper-XXL\_models" --model medium.en ^ --vad_filter true --task transcribe --sentence --language en --beam_size 3 --best_of 3 --verbose true -o "%%~dpa\" ) echo All done. Press any key to Exit. &pause>nul
Mostly I switch to the large v2 model and try --beam_size 8 --best_of 8 if I'm not happy. I don't use large v3 or distil models at all.
@loninappleton. Sorry, I seem to have hijacked your thread.
I suggest you nut your way through with learning to use the software from the link I posted above.
I'm happy to provide a few single line batch files for you to test.Last edited by pcspeak; 28th Oct 2024 at 14:57.
-
You can use Spleeter to isolate the vocals from other sounds.
https://research.deezer.com/projects/spleeter.html
Several Windows application use Spleeter (see list in above link). Give Acon Digital (Acoustica) a try. 30 days free trial fully functional or install it from the GitHub.
Vocals Isolation might help, if you suspect the noise is interfering.
There are other apps but not free.
Also a lot of YouTube videos available on this subject.
@loninappleton sorry I got carried away.
I will stop here.
Similar Threads
-
vosk machine learning and alphacepei models updates
By loninappleton in forum SubtitleReplies: 4Last Post: 21st Aug 2023, 06:34 -
Speech Model updates for VOSK or Whisper
By loninappleton in forum SubtitleReplies: 2Last Post: 18th Jan 2023, 00:47 -
Speech to Text in Subtitle Edit 3.6.5 and forward
By loninappleton in forum SubtitleReplies: 24Last Post: 7th Dec 2022, 16:16 -
Correcting Speech Pauses Automatically?
By BlackVideo in forum SubtitleReplies: 0Last Post: 13th Feb 2022, 03:08 -
Speech Synchronizer for audio tracks
By Nico Darko in forum AudioReplies: 5Last Post: 19th Apr 2020, 11:08