Speech models, any updates

Thread

17th Sep 2024 00:41 #1
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
I was wondering if VOSK or Whisper had any major updates for use with Subtitle Edit.
A check of AlphaCephai shows the vosk-model-en-us-0.22-lgraph model still on the list.
As to Whisper I don't know what progress there might be in that at all.

VOSK mentioned a major update for 2024 last year.

Quote
17th Sep 2024 14:05 #2
Yosho

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2004
Originally Posted by loninappleton

I was wondering if VOSK or Whisper had any major updates for use with Subtitle Edit.
A check of AlphaCephai shows the vosk-model-en-us-0.22-lgraph model still on the list.
As to Whisper I don't know what progress there might be in that at all.

VOSK mentioned a major update for 2024 last year.

There were a couple updates yesterday that kinda broke some things on my end it seemed like.
Doing long subtitles now about 8 seconds causes it to freak out and jump back up to the top.
I got a perfect transcription twice with the one 8 second bit split into two.
Then I adjusted the timing to match the scene dialog, ran it again, and it was something completely different...
Sometimes it great, but other times it just doesn't work the way you want it to,
and I guess that's just the nature of these AI and you gotta take it or leave it..

Quote
17th Sep 2024 14:12 #3
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
Thanks for answering. Sounds like neither has had any major updates.

Quote
27th Oct 2024 18:15 #4
mzso

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2012
Hi!

There are a plethora of options in Subtitle Edit. Which audio to text method/option are the fastest? De default Faster-Whisper seems ironically obnoxiously slow.

Quote
27th Oct 2024 23:47 #5
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
@mzso

Thanks for the response. I checked Vosk a short time ago and expected their announced significant upgrade to appear but the old model numbers are still at the site. And yes I do everything through Subtitle Edit not knowing anything about command line.

Just now I'm not actrive in subtitling. The few I had real _mission_ to do for my own use I've completed and in a shareable source for others. But that's another story. The projects are all plays, some had the AI treatment or transcribed subs and needed a lot of editing, others I ran the tools at Subtitle Edit, then finished with printed published scripts available and added HI (hearing impaired) descriptions.

One thing that stopped me on a project was a documentary of opera star Maria Callas. Whisper would hang when it encountered musical portions and became useless. It may have been my low level PC computer or the source DVD audio. I haven't gotten passed those sorts of problems.

Quote
28th Oct 2024 06:16 #6
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by mzso

Hi!

There are a plethora of options in Subtitle Edit. Which audio to text method/option are the fastest? De default Faster-Whisper seems ironically obnoxiously slow.

I have been using Whisper AI (Command Line) ever since I read about it's existence. SubtitleEdit has several options but I prefer the command line method because it has more flexibility.
I found the best parameter to add to a run is the
--condition_on_previous_text False (Default is True)
very helpful to stop it repeating the same sentence on and on.
There are several expensive software applications that incorporate speech-to-text. Vegas and Da Vinci Studio. I would love to give them a try but I am not going to pay so much money without having the chance to evaluate first.
So in the meantime I am sticking with Whisper AI
https://github.com/openai/whisper

Quote

28th Oct 2024 10:49 #7

Member

@Subtitles
+1
But, I use this:
https://github.com/Purfview/whisper-standalone-win/releases/tag/Faster-Whisper-XXL
Everything needed, except the language models, is in the one package.

You can copy your models over, or nominate the _models folder with

Code:

--model_dir "D:\Whisper-XXL\_models" --model large-v2

For me, the language models are here:

Code:

 Directory of D:\Whisper-XXL\_models\faster-whisper-medium
03/10/2024  01:13 PM    <DIR>          .
03/10/2024  01:13 PM    <DIR>          ..
03/10/2024  12:59 PM             2,574 config.json
03/10/2024  01:13 PM       788,826,555 model.bin
03/10/2024  12:59 PM               339 preprocessor_config.json
03/10/2024  12:59 PM         2,405,678 tokenizer.json
03/10/2024  12:59 PM           825,480 vocabulary.json
               6 File(s)    792,060,626 bytes

 Directory of D:\Whisper-XXL\_models\faster-whisper-large-v2
03/10/2024  01:10 PM    <DIR>          .
03/10/2024  01:10 PM    <DIR>          ..
17/08/2023  10:01 AM             2,796 config.json
17/08/2023  10:57 AM     3,086,912,962 model.bin
17/08/2023  10:01 AM         2,203,239 tokenizer.json
17/08/2023  10:01 AM           459,861 vocabulary.txt
               5 File(s)  3,089,578,858 bytes

Last edited by pcspeak; 28th Oct 2024 at 11:05. Reason: Clarity

Quote

28th Oct 2024 11:51 #8
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
@pcspeak
Thanks for sharing.
How accurate is this version?
With Whisper AI I still have the problem of syncing the first few sentences of text to the audio. I just manually deal with it is not a big issue.

Quote
28th Oct 2024 14:46 #9
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
Subtitles
Thanks for sharing.
How accurate is this version?

Can't say. I haven't used your version for 8 months. With them both using the same language models
I'm guessing about the same. For me it's the speed.

Some of the following parameters are the defaults. I leave them in, and tweak them when looking for better results.
medium_float32bb3.cmd

Code:

@echo off for %%a in ("d:\aa\*.mkv", "d:\aa\*.mp4", "d:\aa\*.avi", "d:\aa\*.mpg") do ( "D:\Whisper-XXL\faster-whisper-xxl.exe" "%%a" ^ --compute_type=int8_float32 ^ --model_dir "D:\Whisper-XXL\_models" --model medium.en ^ --vad_filter true --task transcribe --sentence --language en --beam_size 3 --best_of 3 --verbose true -o "%%~dpa\" ) echo All done. Press any key to Exit. &pause>nul

I find --vad_filter false helps for noisy videos. e.g. motor racing.
Mostly I switch to the large v2 model and try --beam_size 8 --best_of 8 if I'm not happy. I don't use large v3 or distil models at all.

@loninappleton. Sorry, I seem to have hijacked your thread.
I suggest you nut your way through with learning to use the software from the link I posted above.
I'm happy to provide a few single line batch files for you to test.
Last edited by pcspeak; 28th Oct 2024 at 14:57.
Quote
28th Oct 2024 15:07 #10
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
You can use Spleeter to isolate the vocals from other sounds.
https://research.deezer.com/projects/spleeter.html
Several Windows application use Spleeter (see list in above link). Give Acon Digital (Acoustica) a try. 30 days free trial fully functional or install it from the GitHub.
Vocals Isolation might help, if you suspect the noise is interfering.
There are other apps but not free.
Also a lot of YouTube videos available on this subject.

@loninappleton sorry I got carried away.
I will stop here.

Quote

Speech models, any updates

Thread Tools

Similar Threads

vosk machine learning and alphacepei models updates

Speech Model updates for VOSK or Whisper

Speech to Text in Subtitle Edit 3.6.5 and forward

Correcting Speech Pauses Automatically?

Speech Synchronizer for audio tracks