Standalone Faster-Whisper - Portable AI auto-transcription-translation

Thread

29th Apr 2023 03:14 #1
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Whisper is a state of the art auto-transcription-translation model - Robust Speech Recognition via Large-Scale Weak Supervision

Previous audio-to-text implementations were at a "meh" level, Whisper really changes the game.

Here are my compiled binaries for newbies: https://github.com/Purfview/whisper-standalone-win

Last edited by VoodooFX; 30th Oct 2023 at 12:45.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
29th Apr 2023 18:01 #2
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
I am interested in learning more about this but I have no experience in command line work for things like Whisper Faster.

Unrelated, I can report that I did get a full teleplay from years past to run in Whisper with the small model in Subtitle Edit.

Inspection of the video with audio still shows significant errors. I remember another sub I'm working on for a public domain play containing the word usquebaugh. That was a fun one to look up. For now I know that I'll have to do manual corrections yet.

Perhaps a larger model such as medium shown above would reduce the errors but my test with Whisper in SE showed that the larger model will not run on my build.

Please give details on more of this for installing the program elements, model etc.

[edit] My Folder for Whisper Fast is on the desktop, probably not the best place for constructing the path. Anyway I'd need those sorts of details.

My Whisper sample which is 94 mins flies along pretty fast. But I don't know how to get an answer on a separate question on manual editing with Subtitle Edit (to add graphics like music notes and such.) Does SE manipulate the timings with the program? I have no adequate answer for that with minimum space in SE Settings at the lowest default.

Maybe a different sub program would handle this differently.

Last edited by loninappleton; 29th Apr 2023 at 19:54. Reason: grammar

Quote
29th Apr 2023 19:06 #3
davexnet

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2008

Location
United States
I thought --vad_max_speech_duration_s 5
would break it up into lines of speech 5 seconds maximum

from https://pypi.org/project/whisper-ctranslate2/

Code:

--vad_max_speech_duration_s VALUE (int) Maximum duration of speech chunks in seconds. Longer will be split at the timestamp of the last silence.

but I still get this in the results
10
00:05:26,510 --> 00:06:09,040
Soldat. Guten Abend, Soldat. Was hast du für einen feinen Säbel und einen großen Turnister? Du bist ein richtiger Soldat.

In the above, the first "soldat" was spoken at the time shown, but the next words "Guten Abend" not until about 20 seconds later
Quote
30th Apr 2023 08:40 #4
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by loninappleton

run in Whisper with the small model in Subtitle Edit.

There are five different Whisper implementations in SE, saying just "Whisper" doesn't tell us much.

Originally Posted by loninappleton

Please give details on more of this for installing the program elements, model etc.

There is no "install", it's ready to run after unpacking. Don't worry if you don't know how to download models manually, model is downloaded automatically on the first run if not found.

Originally Posted by loninappleton

My Folder for Whisper Fast is on the desktop.

You don't want to copy portable programs to Windows folders, keep it somewhere like "D:\Faster-Whisper".

Originally Posted by loninappleton

But I don't know how to get an answer on a separate question on manual editing with Subtitle Edit (to add graphics like music notes and such.)

Questions about Subtitle Edit you should ask in thread about Subtitle Edit or create a new one.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
30th Apr 2023 08:50 #5
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by davexnet

I thought --vad_max_speech_duration_s 5
would break it up into lines of speech 5 seconds maximum

I preset VAD defaults I find are good for movies, I don't recommend changing them.

Try --verbose True, there you should see some debug output, check if that "20 seconds" segment is removed as no speech.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
30th Apr 2023 11:33 #6
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
@VoodooFX

Thanks,

I'll try not to mix topics from now on.

Quote
30th Apr 2023 15:07 #7
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
A brief question on Whisper Faster used with Subtitle Edit:

Is there a code sequence that does this or is it in the standard Subtitle Edit presentation (screen) ? It was noted in the
other documentation that SE has the path installed for the required ffmpeg.

--

I moved the Whisper Faster folder to C:/

I made a 5 min clip in MKV toolnix also in root.

Looking at the folder contents I do see whisper.exe. Selecting that just briefly opens a CMD box. I admit I don't understand the workings.

If there are additional or replacement installs for the binaries etc, please describe that.

Also, the code above does not show a print to SRT but that may be in the folder content and called when needed.

Time will hopefully work all this out for inexperienced users like me.

Quote
30th Apr 2023 15:12 #8
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
You was provided with a guide in another thread, but for some reason you refused to watch it.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
30th Apr 2023 16:45 #9
davexnet

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2008

Location
United States
Originally Posted by VoodooFX

Originally Posted by davexnet

I thought --vad_max_speech_duration_s 5
would break it up into lines of speech 5 seconds maximum

I preset VAD defaults I find are good for movies, I don't recommend changing them.

Try --verbose True, there you should see some debug output, check if that "20 seconds" segment is removed as no speech.

Hi VoodooFX - here's the verbose output, I've not been able to spot the item you referred to.

Attached Files

whisper_faster_verbose.txt (4.2 KB, 159 views)
Quote
30th Apr 2023 17:17 #10
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by davexnet

Hi VoodooFX - here's the verbose output, I've not been able to spot the item you referred to.

It's there at the top:

Code:

DEBUG - VAD filter kept the following audio segments:....[05:25.564 -> 05:28.388], [05:43.292 -> 05:46.052], [05:49.628 -> 05:53.284], [05:56.412 -> 06:00.708].....

Looks like there is silence gap detected by VAD, dunno why Whisper keeps them in one line, you better ask the dev in Faster-Whisper repo.
InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation
Quote
30th Apr 2023 19:38 #11
davexnet

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2008

Location
United States
Originally Posted by VoodooFX

Originally Posted by davexnet

Hi VoodooFX - here's the verbose output, I've not been able to spot the item you referred to.

It's there at the top:

Code:

DEBUG - VAD filter kept the following audio segments:....[05:25.564 -> 05:28.388], [05:43.292 -> 05:46.052], [05:49.628 -> 05:53.284], [05:56.412 -> 06:00.708].....

Looks like there is silence gap detected by VAD, dunno why Whisper keeps them in one line, you better ask the dev in Faster-Whisper repo.

Ok I'll see what I can find. Looking at the above, the woman says "soldat" at 5.25 and "guten abend" at 5.50 - nothing in between
so according to the above [05:43.292 -> 05:46.052] is a "kept" segment - yet it contains no speech
Quote
30th Apr 2023 19:47 #12
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Some no-speech segments are kept, some speech segments are detected as silence, AI stuff is not perfect, yet.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
2nd May 2023 15:40 #13
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
Originally Posted by VoodooFX

You was provided with a guide in another thread, but for some reason you refused to watch it.

I did see the 15 second one (didn't load it). I am still a fussbudget.

Also I did locate how to access 'discussion' on Github where I needed to make a new login and password. That is not totally resolved yet.

I am downloading the 1.6 version from your page and will see how that goes.

following that I should be able to copy in what you have on the CMD screen if I put a clip in the root where the program content is.
Stop me wherever you like.

Even if I have to bail on all this, I'm still glad that Whisper is finally working for me with the corrected instructions for use in Subtitle Edit. Would Powershell add any convenience to reloading the command screen with a saved script? I still don't know anything, just asking things I think of.

Quote
4th May 2023 00:37 #14
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
[QUOTE=loninappleton;2688644]

Originally Posted by VoodooFX

You was provided with a guide in another thread, but for some reason you refused to watch it.

Apologies,

I retraced the link for the CMD guide which is 18 mins not 15 secs, but that's what I saw that opened and thinking that is just to get to CMD.

On my own I have Whisper-Faster installed at c:\ and checked it with just the couple of CMD commands I remember.

I will likely practice the command string you present here at top just on paper which gives me better understanding of it. I have a new short clip at c:\ as well. I still expect error messages for bad technique or whatever.

Quote
4th May 2023 01:11 #15
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
When I expect errors it's less stressful.

From above when I put everything in c: I wrote the one line script and here is the result

Directory of C:\

08/17/2019 03:00 PM 23,460,413 Making War Horse TEST.m4v
12/07/2019 04:14 AM <DIR> PerfLogs
04/03/2023 05:06 PM <DIR> Program Files
04/03/2023 05:10 PM <DIR> Program Files (x86)
01/13/2023 03:20 PM <DIR> Python39
01/04/2023 01:08 AM <DIR> Users
05/03/2023 11:27 PM <DIR> Whisper-Faster
05/04/2023 01:01 AM 0 whisper.exe
05/03/2023 05:21 PM <DIR> Windows
12/20/2022 05:19 PM <DIR> Windows.old
12/24/2022 08:19 PM <DIR> Windows.old.000
01/11/2023 08:15 PM <DIR> Windows.old.001
09/29/2022 11:22 PM <DIR> You Tube How To all png images March 2 incomplete
2 File(s) 23,460,413 bytes
11 Dir(s) 917,225,558,016 bytes free

C:\>Whisper-Faster>whisper.exe --language English --model "medium" "C:\Making War Horse TEST.m4v"
'Whisper-Faster' is not recognized as an internal or external command,
operable program or batch file.

C:\>

I'm only going on the example given.

Quote
4th May 2023 06:14 #16
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
https://www.youtube.com/watch?v=A3nwRCV-bTU

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
12th Oct 2023 09:10 #17
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Added Linux and Mac OS X executables.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
22nd Oct 2023 04:41 #18
sipho

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2022

Location
Lesotho
@VoodooFX Wow thank you very much. Run your windows binary with the large-v2 model and the results were outstanding. Took 3322 secs (on average) per 45 min episode though. I have tinkered with any of the performance settings though so that's next on my list.

Quote

Standalone Faster-Whisper - Portable AI auto-transcription-translation

Thread Tools

Similar Threads

Speech Model updates for VOSK or Whisper

A guide to generating subtitles through Whisper AI

Subtitle Edit 3.6.10 new version with Whisper option

Adobe Premiere and speech to text transcription

Voice recognition and transcription to text