Whisper is a state of the art auto-transcription-translation model - Robust Speech Recognition via Large-Scale Weak Supervision
Previous audio-to-text implementations were at a "meh" level, Whisper really changes the game.
Here are my compiled binaries for newbies: https://github.com/Purfview/whisper-standalone-win
+ Reply to Thread
Results 1 to 18 of 18
-
Last edited by VoodooFX; 30th Oct 2023 at 12:45.
-
I am interested in learning more about this but I have no experience in command line work for things like Whisper Faster.
Unrelated, I can report that I did get a full teleplay from years past to run in Whisper with the small model in Subtitle Edit.
Inspection of the video with audio still shows significant errors. I remember another sub I'm working on for a public domain play containing the word usquebaugh. That was a fun one to look up. For now I know that I'll have to do manual corrections yet.
Perhaps a larger model such as medium shown above would reduce the errors but my test with Whisper in SE showed that the larger model will not run on my build.
Please give details on more of this for installing the program elements, model etc.
[edit] My Folder for Whisper Fast is on the desktop, probably not the best place for constructing the path. Anyway I'd need those sorts of details.
My Whisper sample which is 94 mins flies along pretty fast. But I don't know how to get an answer on a separate question on manual editing with Subtitle Edit (to add graphics like music notes and such.) Does SE manipulate the timings with the program? I have no adequate answer for that with minimum space in SE Settings at the lowest default.
Maybe a different sub program would handle this differently.Last edited by loninappleton; 29th Apr 2023 at 19:54. Reason: grammar
-
I thought --vad_max_speech_duration_s 5
would break it up into lines of speech 5 seconds maximum
from https://pypi.org/project/whisper-ctranslate2/
Code:--vad_max_speech_duration_s VALUE (int) Maximum duration of speech chunks in seconds. Longer will be split at the timestamp of the last silence.
but I still get this in the results
10
00:05:26,510 --> 00:06:09,040
Soldat. Guten Abend, Soldat. Was hast du für einen feinen Säbel und einen großen Turnister? Du bist ein richtiger Soldat.
In the above, the first "soldat" was spoken at the time shown, but the next words "Guten Abend" not until about 20 seconds later -
There are five different Whisper implementations in SE, saying just "Whisper" doesn't tell us much.
There is no "install", it's ready to run after unpacking. Don't worry if you don't know how to download models manually, model is downloaded automatically on the first run if not found.
You don't want to copy portable programs to Windows folders, keep it somewhere like "D:\Faster-Whisper".
Questions about Subtitle Edit you should ask in thread about Subtitle Edit or create a new one. -
-
A brief question on Whisper Faster used with Subtitle Edit:
Is there a code sequence that does this or is it in the standard Subtitle Edit presentation (screen) ? It was noted in the
other documentation that SE has the path installed for the required ffmpeg.
--
I moved the Whisper Faster folder to C:/
I made a 5 min clip in MKV toolnix also in root.
Looking at the folder contents I do see whisper.exe. Selecting that just briefly opens a CMD box. I admit I don't understand the workings.
If there are additional or replacement installs for the binaries etc, please describe that.
Also, the code above does not show a print to SRT but that may be in the folder content and called when needed.
Time will hopefully work all this out for inexperienced users like me. -
You was provided with a guide in another thread, but for some reason you refused to watch it.
-
-
It's there at the top:
Code:DEBUG - VAD filter kept the following audio segments:....[05:25.564 -> 05:28.388], [05:43.292 -> 05:46.052], [05:49.628 -> 05:53.284], [05:56.412 -> 06:00.708].....
-
-
Some no-speech segments are kept, some speech segments are detected as silence, AI stuff is not perfect, yet.
-
I did see the 15 second one (didn't load it). I am still a fussbudget.
Also I did locate how to access 'discussion' on Github where I needed to make a new login and password. That is not totally resolved yet.
I am downloading the 1.6 version from your page and will see how that goes.
following that I should be able to copy in what you have on the CMD screen if I put a clip in the root where the program content is.
Stop me wherever you like.
Even if I have to bail on all this, I'm still glad that Whisper is finally working for me with the corrected instructions for use in Subtitle Edit. Would Powershell add any convenience to reloading the command screen with a saved script? I still don't know anything, just asking things I think of. -
[QUOTE=loninappleton;2688644] Apologies,
I retraced the link for the CMD guide which is 18 mins not 15 secs, but that's what I saw that opened and thinking that is just to get to CMD.
On my own I have Whisper-Faster installed at c:\ and checked it with just the couple of CMD commands I remember.
I will likely practice the command string you present here at top just on paper which gives me better understanding of it. I have a new short clip at c:\ as well. I still expect error messages for bad technique or whatever. -
When I expect errors it's less stressful.
From above when I put everything in c: I wrote the one line script and here is the result
Directory of C:\
08/17/2019 03:00 PM 23,460,413 Making War Horse TEST.m4v
12/07/2019 04:14 AM <DIR> PerfLogs
04/03/2023 05:06 PM <DIR> Program Files
04/03/2023 05:10 PM <DIR> Program Files (x86)
01/13/2023 03:20 PM <DIR> Python39
01/04/2023 01:08 AM <DIR> Users
05/03/2023 11:27 PM <DIR> Whisper-Faster
05/04/2023 01:01 AM 0 whisper.exe
05/03/2023 05:21 PM <DIR> Windows
12/20/2022 05:19 PM <DIR> Windows.old
12/24/2022 08:19 PM <DIR> Windows.old.000
01/11/2023 08:15 PM <DIR> Windows.old.001
09/29/2022 11:22 PM <DIR> You Tube How To all png images March 2 incomplete
2 File(s) 23,460,413 bytes
11 Dir(s) 917,225,558,016 bytes free
C:\>Whisper-Faster>whisper.exe --language English --model "medium" "C:\Making War Horse TEST.m4v"
'Whisper-Faster' is not recognized as an internal or external command,
operable program or batch file.
C:\>
I'm only going on the example given. -
Added Linux and Mac OS X executables.
-
@VoodooFX Wow thank you very much. Run your windows binary with the large-v2 model and the results were outstanding. Took 3322 secs (on average) per 45 min episode though. I have tinkered with any of the performance settings though so that's next on my list.
Similar Threads
-
Speech Model updates for VOSK or Whisper
By loninappleton in forum SubtitleReplies: 2Last Post: 17th Jan 2023, 23:47 -
A guide to generating subtitles through Whisper AI
By lordlance in forum SubtitleReplies: 1Last Post: 12th Jan 2023, 20:44 -
Subtitle Edit 3.6.10 new version with Whisper option
By loninappleton in forum SubtitleReplies: 33Last Post: 18th Dec 2022, 14:24 -
Adobe Premiere and speech to text transcription
By ChasVideo in forum Newbie / General discussionsReplies: 4Last Post: 15th Jul 2022, 16:16 -
Voice recognition and transcription to text
By JosephTocco in forum Newbie / General discussionsReplies: 8Last Post: 27th Jul 2021, 13:42