VideoHelp Forum




+ Reply to Thread
Results 1 to 21 of 21
  1. I missed a couple of updates from 4.01 and just got them. Are there noticeable improvements in the Whisper model? It looks like SE and Purfview have spent considerable time on it from the log notes. What is the preferred model to use with old style AMD Phenom processor?
    Quote Quote  
  2. Video Damager VoodooFX's Avatar
    Join Date
    Oct 2021
    Location
    At Doom9
    Search PM
    That's old CPU, probably you will be limited on RAM too. Basically you don't want to use smaller model than medium.
    Btw, Faster-Whisper-XXL is a bit faster for CPUs.
    Quote Quote  
  3. I have another setup which is an AMD Ryzen on a low level Asus Ryzen board. Right now that setup is not active. I need a new monitor
    for it. That runs Win 10. Those things may be needed for adequate performance or even downloads of the model through SE. I had a problem getting a model on the old Win7 setup. It gave an error. I asked SE about it.
    Quote Quote  
  4. I keep getting text in the transcription that has nothing to do with the subs:

    Dialogue: 0,0:00:29.16,0:00:42.20,Default,,0,0,0,,© BF-WATCH TV\N2021 A little pause.

    Dialogue: 0,0:03:20.52,0:03:29.52,Default,,0,0,0,,A little pause...\Nand we are back.
    Dialogue: 0,0:03:38.56,0:03:46.56,Default,,0,0,0,,A little pause... and we are back.
    Dialogue: 0,0:04:19.76,0:04:27.76,Default,,0,0,0,,A little pause... and we are back.
    Dialogue: 0,0:04:57.70,0:04:59.06,Default,,0,0,0,,A little pause... and we are back.
    Dialogue: 0,0:05:21.46,0:05:29.46,Default,,0,0,0,,A little pause...
    Dialogue: 0,0:05:37.75,0:05:38.75,Default,,0,0,0,,and we are back.

    What's the cause of this, do I have to make a donation to prevent this?
    Quote Quote  
  5. It looks to me that this whole audio to text program is fake.
    An insult to SubTitle Edit.
    Last edited by Mondriaan; 5th Jan 2025 at 11:34. Reason: Completation
    Quote Quote  
  6. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by Mondriaan View Post
    It looks to me that this whole audio to text program is fake.
    An insult to SubTitle Edit.
    Use Whisper AI (Command Line Version)
    https://github.com/openai/whisper
    Quote Quote  
  7. Oke i will try, thanks.
    Quote Quote  
  8. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by Mondriaan View Post
    Oke i will try, thanks.
    What is your computer specifications?
    Especially do you have a GPU and how much VRAM it has?
    Whisper AI has several levels of accuracy which depend on the GPU VRAM size.
    Quote Quote  
  9. I have used the Whisper AI with a large model but i still dont like the results, i think free transcription is not something you can expect much from, maybe if there are paid methods you can get good effects with it but if this is supposed to be AI I think I have to get a lot older before transcription works smoothly.
    Quote Quote  
  10. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by Mondriaan View Post
    I have used the Whisper AI with a large model but i still dont like the results, i think free transcription is not something you can expect much from, maybe if there are paid methods you can get good effects with it but if this is supposed to be AI I think I have to get a lot older before transcription works smoothly.
    Can you tell us more about your experience with Whisper AI?

    I had a bit of difficulties at first when it generated the same text again and again but I found that adding
    --condition_on_previous_text False
    made it possible to continue with the transcription.

    I think it is the best thing that ever happened in recent years. I am able to create subtitles for CD and DVD operas in different languages and the results are amazing. I had to clean it up a bit after creation but the best part is that it is totally free.
    Also I have a lot of foreign movies without subtitles, but with Whisper AI I am able now to understand all the movies perfectly
    Sure there are several paid options but they cost a lot of money and I think the transcription is done by third parties so if you have any private audios or videos, you don't really want to upload it outside your computer.
    Quote Quote  
  11. Originally Posted by VoodooFX View Post
    That's old CPU, probably you will be limited on RAM too. Basically you don't want to use smaller model than medium.
    Btw, Faster-Whisper-XXL is a bit faster for CPUs.
    I concur. Works great for me. FWIW here is my batch file that I have been using. Might not be ideal but gets the job done. I run the resulting SRT file through SubtitleEdit to clean it up.

    D:\faster-whisper-xxl\faster-whisper-xxl.exe "<Input File Path>\<Input Filename>" --model=large-v2 --output_dir="<Output File Path>" --task=transcribe --language=English --vad_alt_method=pyannote_v3 --beep_off --sentence
    Quote Quote  
  12. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    This is mine.
    Most of the parameters are 'in my face', so I don't forget what I'm currently working with. I don't use the = sign.
    Code:
    "D:\Whisper-XXL\faster-whisper-xxl.exe" "D:\a\video.mp4" --compute_type int8_float32 --model_dir "D:\Whisper-XXL\_models" --model medium.en ^
        --vad_filter true --vad_method pyannote_v3 --task transcribe --sentence --language en --beam_size 3 --best_of 3 --verbose true  --output_dir source
    Last edited by pcspeak; 6th Jan 2025 at 13:42.
    Quote Quote  
  13. When I transcribe a file I get mixed Japanese and Russian, whether I use the --english parameter or not.
    Quote Quote  
  14. I am impressed when I scan audio media (for example, mp3), can I also put complete folders in the command line?
    Oh no, it'll just make a mess.
    Last edited by Mondriaan; 7th Jan 2025 at 04:33. Reason: Consideration
    Quote Quote  
  15. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by Mondriaan View Post
    When I transcribe a file I get mixed Japanese and Russian, whether I use the --english parameter or not.
    In Whisper AI, you don't use the --english parameter
    The correct script is --language English
    Quote Quote  
  16. Originally Posted by pcspeak View Post
    This is mine.
    Most of the parameters are 'in my face', so I don't forget what I'm currently working with. I don't use the = sign.
    Code:
    "D:\Whisper-XXL\faster-whisper-xxl.exe" "D:\a\video.mp4" --compute_type int8_float32 --model_dir "D:\Whisper-XXL\_models" --model medium.en ^
        --vad_filter true --vad_method pyannote_v3 --task transcribe --sentence --language en --beam_size 3 --best_of 3 --verbose true  --output_dir source
    Hey @pcspeak, can you please briefly explain why you use the (bolded) parameter options you have there? Wanted to understand what made you use these. Also, the '--task transcribe' - I take it this is just for efficiency? Otherwise I assume by default Whisper attempts to identify whether it's a transcription or translation job in the beginning, and then sets this permanently for the rest of the job, as opposed to checking for each chunk?
    Quote Quote  
  17. Originally Posted by Moralez View Post
    Originally Posted by VoodooFX View Post
    That's old CPU, probably you will be limited on RAM too. Basically you don't want to use smaller model than medium.
    Btw, Faster-Whisper-XXL is a bit faster for CPUs.
    I concur. Works great for me. FWIW here is my batch file that I have been using. Might not be ideal but gets the job done. I run the resulting SRT file through SubtitleEdit to clean it up.

    D:\faster-whisper-xxl\faster-whisper-xxl.exe "<Input File Path>\<Input Filename>" --model=large-v2 --output_dir="<Output File Path>" --task=transcribe --language=English --vad_alt_method=pyannote_v3 --beep_off --sentence
    Hey @Moralez, just also wondering you use the VAD pyannote_v3? Is it the most accurate for your particular type of videos?
    Quote Quote  
  18. Originally Posted by pcspeak View Post
    This is mine.
    Most of the parameters are 'in my face', so I don't forget what I'm currently working with. I don't use the = sign.
    Code:
    "D:\Whisper-XXL\faster-whisper-xxl.exe" "D:\a\video.mp4" --compute_type int8_float32 --model_dir "D:\Whisper-XXL\_models" --model medium.en ^
        --vad_filter true --vad_method pyannote_v3 --task transcribe --sentence --language en --beam_size 3 --best_of 3 --verbose true  --output_dir source
    Hey @pcspeak, could you help briefly explain why you use the parameters (bolded) you have here? Just wanted to understand the reasoning behind why you use it for your use case. Or solely for speed/efficiency balanced against accuracy?

    I used to use an older version a year back, and subtitles were fine, moving to the updated Whisper version, the subtitle lengths are ridiculous. I find either having to use --sentence (which sometimes isn't great), or --standard (what are your thoughts on this if you've tried it before)?

    Also, does "--task transcribe" just make things more efficient for Whisper, otherwise I imagine it would not affect the WER or error rate if you left it to Whisper to work out on its own?
    Last edited by Restricted; 15th Jan 2025 at 03:58.
    Quote Quote  
  19. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    Originally Posted by Restricted View Post
    Hey @pcspeak, could you help briefly explain why you use the parameters (bolded) you have here?
    I've been using Faster-Whisper since mid 2023. The parameters I choose may change daily. The ones I quoted are my most common ones.
    It's not unusual for me to try 3-4 of my batch files for each video.
    Interestingly, the small model often gives results close to those of medium or large. I may start with the small model on videos I know don't have too much background noise. It's quick and I can kill the job if I'm not happy.

    Code:
    "D:\Whisper-XXL\faster-whisper-xxl.exe" "%%a" --compute_type=int8_float16 --model_dir "D:\Whisper-XXL\_models" --model small.en ^
         --vad_filter false --task transcribe --sentence --language en --beam_size 5 --best_of 5 --verbose true -o source
    --task transcribe is the default so it's not really needed. --sentence can be a bit hit or miss. For me, the medium model tends to give the best results for that parameter.
    I currently have 9 batch files that I may try . . . and edit them to try something new.
    By their names you can probably work out the parameters used in each (bb5 - --beam_size 5 --best_of 5)

    largeV3-turbo_float32bb1.cmd
    largeV3-turbo_float32bb3-vadoff.cmd
    medium_float32bb3-Dakar.cmd
    medium_float32bb3.cmd
    medium_float32bb5-vadoff.cmd
    medium_float32bb8-vadoff.cmd
    small_float16bb5-vadoff.cmd
    small_float32bb5.cmd
    small_float32bb8-vadoff.cmd


    Under a number of circumstances I do like to use the --initial_prompt parameter.
    e.g. For Formula 1 the following helps, but not a lot.
    Code:
    --initial_prompt="Max Verstappen, Lando Norris, Charles Leclerc, Oscar Piastri, Carlos Sainz, Yuki Tsunoda, Esteban Ocon, Franco Colapinto"
    Cheers.
    Quote Quote  
  20. Originally Posted by Restricted View Post
    Originally Posted by Moralez View Post
    Originally Posted by VoodooFX View Post
    That's old CPU, probably you will be limited on RAM too. Basically you don't want to use smaller model than medium.
    Btw, Faster-Whisper-XXL is a bit faster for CPUs.
    I concur. Works great for me. FWIW here is my batch file that I have been using. Might not be ideal but gets the job done. I run the resulting SRT file through SubtitleEdit to clean it up.

    D:\faster-whisper-xxl\faster-whisper-xxl.exe "<Input File Path>\<Input Filename>" --model=large-v2 --output_dir="<Output File Path>" --task=transcribe --language=English --vad_alt_method=pyannote_v3 --beep_off --sentence
    Hey @Moralez, just also wondering you use the VAD pyannote_v3? Is it the most accurate for your particular type of videos?
    That parameter was stated as The best accuracy, supports CUDA.

    https://github.com/Purfview/whisper-standalone-win/discussions/231
    Quote Quote  
  21. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    @loninappleton
    I realize that I'm not talking about Whisper IN Subtitle Edit, sorry for partially hijacking your thread.
    Of course, all the parameters I use can be used in SE.

    Folks.
    I did some testing last night. The results were interesting.
    I recorded 'Dakar Rally 2025 Ep11 Stage08' from our local fta TV station.
    The following gave me a ~97% accurate transcription. The small.en model with the --sentence parameter worked very well.
    Except:
    I had to convert the recorded mpg file to an mkv using MKVtoolnixGUI, dropping all but the video and audio streams.
    Using the small.en model worked better than the small model. I've no idea why.
    The names of the drivers and riders were a mess, as to be expected.

    small_float32bb8-vadoff.cmd
    Code:
    "D:\Whisper-XXL\faster-whisper-xxl.exe" "Dakar Rally 2025 Ep11 Stage08.mkv" ^
        --compute_type=int8_float32 ^
        --model_dir "D:\Whisper-XXL\_models" --model small.en ^
        --vad_filter false --task transcribe --language en --sentence --beam_size 8 --best_of 8 --verbose true -o source
    Cheers.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!