VideoHelp Forum
+ Reply to Thread
Results 1 to 18 of 18
Thread
  1. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Whisper is a state of the art auto-transcription-translation model - Robust Speech Recognition via Large-Scale Weak Supervision



    Previous audio-to-text implementations were at a "meh" level, Whisper really changes the game.

    Here are my compiled binaries for newbies: https://github.com/Purfview/whisper-standalone-win
    Quote Quote  
  2. I am interested in learning more about this but I have no experience in command line work for things like Whisper Faster.

    Unrelated, I can report that I did get a full teleplay from years past to run in Whisper with the small model in Subtitle Edit.

    Inspection of the video with audio still shows significant errors. I remember another sub I'm working on for a public domain play containing the word usquebaugh. That was a fun one to look up. For now I know that I'll have to do manual corrections yet.

    Perhaps a larger model such as medium shown above would reduce the errors but my test with Whisper in SE showed that the larger model will not run on my build.

    Please give details on more of this for installing the program elements, model etc.

    [edit] My Folder for Whisper Fast is on the desktop, probably not the best place for constructing the path. Anyway I'd need those sorts of details.

    My Whisper sample which is 94 mins flies along pretty fast. But I don't know how to get an answer on a separate question on manual editing with Subtitle Edit (to add graphics like music notes and such.) Does SE manipulate the timings with the program? I have no adequate answer for that with minimum space in SE Settings at the lowest default.

    Maybe a different sub program would handle this differently.
    Last edited by loninappleton; 29th Apr 2023 at 19:54. Reason: grammar
    Quote Quote  
  3. Member
    Join Date
    Mar 2008
    Location
    United States
    Search Comp PM
    I thought --vad_max_speech_duration_s 5
    would break it up into lines of speech 5 seconds maximum

    from https://pypi.org/project/whisper-ctranslate2/

    Code:
    --vad_max_speech_duration_s VALUE (int)
    Maximum duration of speech chunks in seconds. Longer will be split at the timestamp of the last silence.

    but I still get this in the results
    10
    00:05:26,510 --> 00:06:09,040
    Soldat. Guten Abend, Soldat. Was hast du für einen feinen Säbel und einen großen Turnister? Du bist ein richtiger Soldat.

    In the above, the first "soldat" was spoken at the time shown, but the next words "Guten Abend" not until about 20 seconds later
    Quote Quote  
  4. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Originally Posted by loninappleton View Post
    run in Whisper with the small model in Subtitle Edit.
    There are five different Whisper implementations in SE, saying just "Whisper" doesn't tell us much.

    Originally Posted by loninappleton View Post
    Please give details on more of this for installing the program elements, model etc.
    There is no "install", it's ready to run after unpacking. Don't worry if you don't know how to download models manually, model is downloaded automatically on the first run if not found.

    Originally Posted by loninappleton View Post
    My Folder for Whisper Fast is on the desktop.
    You don't want to copy portable programs to Windows folders, keep it somewhere like "D:\Faster-Whisper".

    Originally Posted by loninappleton View Post
    But I don't know how to get an answer on a separate question on manual editing with Subtitle Edit (to add graphics like music notes and such.)
    Questions about Subtitle Edit you should ask in thread about Subtitle Edit or create a new one.
    Quote Quote  
  5. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Originally Posted by davexnet View Post
    I thought --vad_max_speech_duration_s 5
    would break it up into lines of speech 5 seconds maximum
    I preset VAD defaults I find are good for movies, I don't recommend changing them.

    Try --verbose True, there you should see some debug output, check if that "20 seconds" segment is removed as no speech.
    Quote Quote  
  6. @VoodooFX

    Thanks,

    I'll try not to mix topics from now on.
    Quote Quote  
  7. A brief question on Whisper Faster used with Subtitle Edit:

    Is there a code sequence that does this or is it in the standard Subtitle Edit presentation (screen) ? It was noted in the
    other documentation that SE has the path installed for the required ffmpeg.

    --

    I moved the Whisper Faster folder to C:/

    I made a 5 min clip in MKV toolnix also in root.

    Looking at the folder contents I do see whisper.exe. Selecting that just briefly opens a CMD box. I admit I don't understand the workings.

    If there are additional or replacement installs for the binaries etc, please describe that.

    Also, the code above does not show a print to SRT but that may be in the folder content and called when needed.

    Time will hopefully work all this out for inexperienced users like me.
    Quote Quote  
  8. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    You was provided with a guide in another thread, but for some reason you refused to watch it.
    Quote Quote  
  9. Member
    Join Date
    Mar 2008
    Location
    United States
    Search Comp PM
    Originally Posted by VoodooFX View Post
    Originally Posted by davexnet View Post
    I thought --vad_max_speech_duration_s 5
    would break it up into lines of speech 5 seconds maximum
    I preset VAD defaults I find are good for movies, I don't recommend changing them.

    Try --verbose True, there you should see some debug output, check if that "20 seconds" segment is removed as no speech.
    Hi VoodooFX - here's the verbose output, I've not been able to spot the item you referred to.
    Image Attached Files
    Quote Quote  
  10. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Originally Posted by davexnet View Post
    Hi VoodooFX - here's the verbose output, I've not been able to spot the item you referred to.
    It's there at the top:
    Code:
    DEBUG - VAD filter kept the following audio segments:....[05:25.564 -> 05:28.388], [05:43.292 -> 05:46.052], [05:49.628 -> 05:53.284], [05:56.412 -> 06:00.708].....
    Looks like there is silence gap detected by VAD, dunno why Whisper keeps them in one line, you better ask the dev in Faster-Whisper repo.
    Quote Quote  
  11. Member
    Join Date
    Mar 2008
    Location
    United States
    Search Comp PM
    Originally Posted by VoodooFX View Post
    Originally Posted by davexnet View Post
    Hi VoodooFX - here's the verbose output, I've not been able to spot the item you referred to.
    It's there at the top:
    Code:
    DEBUG - VAD filter kept the following audio segments:....[05:25.564 -> 05:28.388], [05:43.292 -> 05:46.052], [05:49.628 -> 05:53.284], [05:56.412 -> 06:00.708].....
    Looks like there is silence gap detected by VAD, dunno why Whisper keeps them in one line, you better ask the dev in Faster-Whisper repo.
    Ok I'll see what I can find. Looking at the above, the woman says "soldat" at 5.25 and "guten abend" at 5.50 - nothing in between
    so according to the above [05:43.292 -> 05:46.052] is a "kept" segment - yet it contains no speech
    Quote Quote  
  12. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    Some no-speech segments are kept, some speech segments are detected as silence, AI stuff is not perfect, yet.
    Quote Quote  
  13. Originally Posted by VoodooFX View Post
    You was provided with a guide in another thread, but for some reason you refused to watch it.
    I did see the 15 second one (didn't load it). I am still a fussbudget.

    Also I did locate how to access 'discussion' on Github where I needed to make a new login and password. That is not totally resolved yet.

    I am downloading the 1.6 version from your page and will see how that goes.

    following that I should be able to copy in what you have on the CMD screen if I put a clip in the root where the program content is.
    Stop me wherever you like.

    Even if I have to bail on all this, I'm still glad that Whisper is finally working for me with the corrected instructions for use in Subtitle Edit. Would Powershell add any convenience to reloading the command screen with a saved script? I still don't know anything, just asking things I think of.
    Quote Quote  
  14. [QUOTE=loninappleton;2688644]
    Originally Posted by VoodooFX View Post
    You was provided with a guide in another thread, but for some reason you refused to watch it.
    Apologies,

    I retraced the link for the CMD guide which is 18 mins not 15 secs, but that's what I saw that opened and thinking that is just to get to CMD.

    On my own I have Whisper-Faster installed at c:\ and checked it with just the couple of CMD commands I remember.

    I will likely practice the command string you present here at top just on paper which gives me better understanding of it. I have a new short clip at c:\ as well. I still expect error messages for bad technique or whatever.
    Quote Quote  
  15. When I expect errors it's less stressful.

    From above when I put everything in c: I wrote the one line script and here is the result

    Directory of C:\

    08/17/2019 03:00 PM 23,460,413 Making War Horse TEST.m4v
    12/07/2019 04:14 AM <DIR> PerfLogs
    04/03/2023 05:06 PM <DIR> Program Files
    04/03/2023 05:10 PM <DIR> Program Files (x86)
    01/13/2023 03:20 PM <DIR> Python39
    01/04/2023 01:08 AM <DIR> Users
    05/03/2023 11:27 PM <DIR> Whisper-Faster
    05/04/2023 01:01 AM 0 whisper.exe
    05/03/2023 05:21 PM <DIR> Windows
    12/20/2022 05:19 PM <DIR> Windows.old
    12/24/2022 08:19 PM <DIR> Windows.old.000
    01/11/2023 08:15 PM <DIR> Windows.old.001
    09/29/2022 11:22 PM <DIR> You Tube How To all png images March 2 incomplete
    2 File(s) 23,460,413 bytes
    11 Dir(s) 917,225,558,016 bytes free

    C:\>Whisper-Faster>whisper.exe --language English --model "medium" "C:\Making War Horse TEST.m4v"
    'Whisper-Faster' is not recognized as an internal or external command,
    operable program or batch file.

    C:\>

    I'm only going on the example given.
    Quote Quote  
  16. Member
    Join Date
    Dec 2022
    Location
    Lesotho
    Search Comp PM
    @VoodooFX Wow thank you very much. Run your windows binary with the large-v2 model and the results were outstanding. Took 3322 secs (on average) per 45 min episode though. I have tinkered with any of the performance settings though so that's next on my list.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!