VideoHelp Forum




+ Reply to Thread
Page 2 of 3
FirstFirst 1 2 3 LastLast
Results 31 to 60 of 65
  1. Got it. BTW found the perfect way to do the subs with Purfview-Whisper-Faster XXL in SubtitleEdit.

    1. Download ffmpeg and Purfview-Whisper-Faster XXL under video > Audio to text(whisper...)

    2 In Advanced Parameters > --compute_type int8 --beam_size 2 --best_of 1 --temperature 0 --threads 8 --standard --beep_off

    Image
    [Attachment 89384 - Click to enlarge]


    Add your video file mkv/mp4 and Press Generate.

    You can refine them via SE batch (settings.xml) attached herewith
    Image Attached Files
    Quote Quote  
  2. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    Nice, thanks.

    SE whisper subs are nicely formatted, but they miss a lot of dialogs.

    For example with Kabul S01-E01 that has only forced subs for foreign language, when I run this whisper model, srt file is 16895 bytes long, while Assembly one is 31317, almost double.

    Whisper has 238 lines, Assembly has 462 ones.

    Assembly is not free, but 456 free hours will last me a long time. And as we saw, it does not work with music, but I don't need that.

    I also ran my stand-alone whisper with:

    C:\Users\m1\AppData\Local\Programs\Python\Python31 1\Scripts\whisper.exe" "A:\proa\test\Kabul S01-E01.mkv" --model medium --task transcribe --word_timestamps True --device cuda

    It's maybe 50 times slower than the one with SE, and it gave me a 719 line SRT file, but full of artifacts with the same sub repeated many times.

    Overall, none of these subs feels natural.

    I'd say that Assembly is the less bad, actually watchable.

    I could not watch with the standalone whisper one when I tried before using Assembly, subs looked too bad. With SE whisper, half of the dialog is missing.
    Quote Quote  
  3. Whisper has 238 lines, Assembly has 462 ones
    [--large v2 --beam_size 5 --best_of 5]

    Try it ! > --large v2 --beam_size 5 --best_of 5 --task transcribe --word_timestamps True --temperature 0 --device cuda --standard --beep_off

    if [--beam_size 5 --best_of 5] gives bad results shift to [--beam_size 2 --best_of 1]

    Assembly need ACR correction and they give 350 hrs now instead of 456 hrs
    Last edited by sam12345; 26th Oct 2025 at 22:24.
    Quote Quote  
  4. C:\Users\m1\AppData\Local\Programs\Python\Python31 1\Scripts\whisper.exe" "A:\proa\test\Kabul S01-E01.mkv" --model medium --task transcribe --word_timestamps True --device cuda
    install Faster-Whisper-XXL_r192.3.4_windows.7z form https://github.com/Purfview/whisper-standalone-win/releases

    and do "faster-whisper-xxl.exe" from installation path > CMD or any path > CMD after adding the installation path in Environment Variables

    Take the cuda libraries from here https://github.com/Purfview/whisper-standalone-win/releases/tag/libs
    Last edited by sam12345; 26th Oct 2025 at 22:21.
    Quote Quote  
  5. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    Thanks, I'll try all that.

    I was having fun lately using Visual Studio 2022 to debug remotely a Linux project. VS2022 starts remotely gdb, and you get the nice UI in Windows to step through the code, see variables, set breakpoints, etc...

    Do you do development on both Windows and Linux?

    If you do, I'll share.
    Quote Quote  
  6. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    I tested Faster-Whisper-XXL_r245.1. It's much faster and better than the whisper.exe I was using, thanks.

    But it's still missing a lot of dialog.

    Example with Assembly:

    Code:
    And that more and endless American
    
    military force could not create or sustain
    
    a durable Afghan government. I've
    
    concluded that it's time to end America's
    
    longest war.
    With whisper:

    Code:
    could not create or sustain a
    durable Afghan government.
    
    I've concluded that it's time
    to end America's longest war.
    This whisper is much better formatted, less subs for the same content grouping phrases, but it missed the beginning.

    Edit: I'll test other models.
    Last edited by robena; 27th Oct 2025 at 03:34.
    Quote Quote  
  7. Use this command > CMD


    faster-whisper-xxl.exe "My_Video.mkv" ^
    --task transcribe ^
    -l en ^
    -m large-v2 ^
    --device cuda ^
    --word_timestamps True ^
    --compute_type float16 ^
    --vad_alt_method silero_v4 ^
    --beam_size 2 ^
    --best_of 1 ^
    --temperature 0 ^
    --threads 8 ^
    --max_line_count 2 ^
    --max_line_width 32 ^
    --output_dir "source"


    Cahnge 'My_Video.mkv' with your video file; You can update the 'threads 8' as per your system cpu.

    --max_line_count 2
    --max_line_width 32

    OR

    --max_line_width 36

    Will balnce the lines
    Quote Quote  
  8. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    I tried the --diarize reverb_v2 model, supposed to be the most accurate. Still missing a lot of dialog.

    Edit: I did not see your post above while posing this. I will try it.
    Quote Quote  
  9. Does Assembly accepts these flags

    --task transcribe ^
    -l en ^
    -m large-v2 ^
    --device cuda ^
    --word_timestamps True ^
    --compute_type float16 ^
    --vad_alt_method silero_v4 ^
    --beam_size 2 ^
    --best_of 1 ^
    --temperature 0 ^
    --threads 8 ^
    --max_line_count 2 ^
    --max_line_width 32 ^

    SE accepts all but not vad_method silero_v4
    Quote Quote  
  10. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    I tried your settings, thanks, still missing a lot of dialog.

    But it's free and nicely formatted, so, usable if you have a lot of files to process.

    No idea for Assembly, you could ask their support.
    Quote Quote  
  11. I tested Faster-Whisper-XXL_r245.1. It's much faster and better than the whisper.exe I was using, thanks.
    Use version is r192.3.4 and this version does not accept --diarize reverb_v2
    Quote Quote  
  12. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    I will,and report if it's better on this particular clip.
    Quote Quote  
  13. Try model "medium"
    Last edited by sam12345; 27th Oct 2025 at 09:56.
    Quote Quote  
  14. Try this > in Faster-Whisper-XXL_r245.4

    faster-whisper-xxl.exe "My_Video.mkv" ^
    --task transcribe ^
    -l en ^
    -m medium ^
    --device cuda ^
    --compute_type float16 ^
    --vad_method silero_v4 ^
    --beam_size 2 ^
    --best_of 1 ^
    --temperature 0 ^
    --threads 8 ^
    --max_line_count 2 ^
    --max_line_width 36 ^
    --output_dir "source"

    Kindly update the number of threads as per your system CPU. This version r245.4 accepts "vad_method silero_v4" and version r192.3 accepts "vad_alt_method silero_v4"
    Quote Quote  
  15. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    I have a Threadripper 9960X, so I used 48 threads.

    CPU load was between 10 and 35%. GPU at 30%. It runs FAST!

    Whisper does better grouping, subs looks less AI generated than with Assembly.

    But there are artifacts, same sub repeated, some subs have the wrong time-stamp being displayed 20 seconds to soon.

    I though this might be generated by a language change. For this video, I have the forced subs for foreign dialog, and only need subs for the English one, to be merged then with SE.

    So I used:

    --model medium.en --language en --task transcribe --device cuda --compute_type float16 --vad_method silero_v4 --beam_size 1 --temperature 0 --threads 48 --initial_prompt "Spoken in English." --condition_on_previous_text False --max_line_count 2 --max_line_width 40 --output_dir "E:\Proe"

    Most of the foreign part was ignored, but not all. Total number of subs is close to Assembly this time.

    I did not check every subs, but the 20s errors after a language change at this particular location did not happen.

    So, with multiple languages, use one pass for each and then merge using SE.

    I'll watch an entire episode made with whisper. If I see bad things, I'll compare with Assembly and report.

    And output anyway needs to be processed by SE to correct some usual errors, mostly timing.

    Thanks!
    Quote Quote  
  16. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    Edit: this fails on some files with:

    Could not find codec parameters for stream 2 (Subtitle: hdmv_pgs_subtitle (pgssub)): unspecified size

    Solution is to extract audio first:

    ffmpeg.exe -analyzeduration 200M -probesize 200M -i "your_file.mkv" -vn -acodec pcm_s16le -ar 16000 -ac 1 "audio.wav"

    and process audio.wav.
    Quote Quote  
  17. try --beam_size 2 --best_of 1
    Last edited by sam12345; 27th Oct 2025 at 23:17.
    Quote Quote  
  18. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    That was useful.

    I am running now:

    --model medium.en --language en --task transcribe --device cuda --compute_type float16 --vad_method silero_v5_fw --vad_threshold 0.4 --vad_min_speech_duration_ms 200 --vad_min_silence_duration_ms 300 --hallucination_silence_threshold 0.3 --no_speech_threshold 0.6 --logprob_threshold -2.0 --beam_size 3 --best_of 1 --temperature 0.1 --repetition_penalty 1.05 --no_repeat_ngram_size 3 --condition_on_previous_text False --language_detection_segments 5 --language_detection_threshold 0.85 --max_line_count 2 --max_line_width 40

    I'll have to watch an entire episode, but it seems to have suppressed the hallucinations for foreign languages.

    And as said before, subs look much more human made than Assembly ones, using this model was a very good suggestion.

    I may try:

    --model large-v3 --task translate --multilingual --device cuda --compute_type float16 --vad_method silero_v5_fw --vad_threshold 0.4 --beam_size 3 --best_of 1 --temperature 0.1 --repetition_penalty 1.05 --no_repeat_ngram_size 3 --condition_on_previous_text False --hallucination_silence_threshold 0.3 --no_speech_threshold 0.6 --logprob_threshold -2.0 --compression_ratio_threshold 1.2 --max_line_count 2 --max_line_width 40 --output_format srt

    to translate everything, but since here I have human made subs for the foreign language, it's best to merge them.

    Having fun trying all that here!
    Quote Quote  
  19. --model large-v3 and --model large-v2 are less compatible with Faster Whisper XXL. Stick to model medium amd --temprature to 0. Rest all fine. Do report compared to Assemby AI.
    Last edited by sam12345; 28th Oct 2025 at 08:57.
    Quote Quote  
  20. @robena

    For ffmpeg tasks you can try Clever FFmpeg Gui. Its very nicely designed.

    For example with Kabul S01-E01 that has only forced subs for foreign language
    Try these subs > https://drive.google.com/drive/folders/1Mx_IHMIR452NP1104NS2mwsUtY3vS1AZ?usp=sharing
    Last edited by sam12345; 28th Oct 2025 at 11:03.
    Quote Quote  
  21. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    Thanks, these are the foreign language only subs similar to the ones I have.

    That's why I need using whisper to generate the English speaking ones.
    Last edited by robena; 28th Oct 2025 at 12:07.
    Quote Quote  
  22. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    Originally Posted by sam12345 View Post
    These are complete, thanks.

    Did you generate them yourself?

    With what switches for whisper if you did?

    I gave up on generating English subs only, that's not reliable, so I'm trying myself to translate every thing.

    Your subs seem great, I'll have to watch fully to confirm.
    Quote Quote  
  23. Yes I have done myself. Steps...

    1. Extract the audio of all 6 episode with Clever FFmpge GUI

    1A. Add all of them is SE

    2. SE 4.0.14 - latest with model "medium" not "medium.en". Medium has all the languages.

    3. with a single command in Se "Advanced" > --task translate --word_timestamps True --compute_type int8 --vad_method silero_v4 --beam_size 2 --best_of 1 --temperature 0 --threads 8 --standard --beep_off

    Image
    [Attachment 89427 - Click to enlarge]


    Finally "batch" with SE for final touchup. Settings.xml attached herewith
    Image Attached Files
    Quote Quote  
  24. Use SubSync to sync the subs with video if they don't match
    Quote Quote  
  25. Member
    Join Date
    May 2008
    Location
    France
    Search Comp PM
    Great find with SE, here is why it works better than my batch tries:

    Inside SE, Putfview’s Whisper plug-in doesn’t feed the whole file to the binary once.
    It slices the audio internally (usually 30–60 s chunks with ~1 s overlap) and calls Faster-Whisper-XXL repeatedly.

    That prevents whisper missing the dialogs that it misses when it processes the whole file at once.

    It's simple enough so that I don't need to re-invent the wheel and use that.

    Just a tiny problem. When I use your exact settings, using the same SE version, the subs I get miss a small number of dialogs compared to yours.

    Maybe because I have a different MKV version? Time stamps are not the same.

    Anyway, all this is VERY useful. I look forward to test SubSync.

    I have thousands of old series episodes with uppercase subs, and when getting proper lowercase ones, they don't sync.

    I learned to sync them manually pretty fast, but an auto process will be great.
    Quote Quote  
  26. Just a tiny problem. When I use your exact settings, using the same SE version, the subs I get miss a small number of dialogs compared to yours.

    Maybe because I have a different MKV version? Time stamps are not the same.
    Use a Python Script to overcome that. Steps..

    1. Get pythom 3.11.9 installed ..ignore if you already have

    2. Exract the subs that have foreign language from the video file

    3. Place the two subs (.srt) file's (A; made with Se) (B;extracted foreign language from video file) in a folder with (python merge_subs.py) script ..encl as below

    4.Rename the srt files to...

    complete_parts.srt (made with SE; having more bytes)

    missing_parts.srt (extracted one with foreign language; having less bytes)

    finally

    run > cmd > from the path ( assumed;you have assigned the path at the time installation of python)

    python merge_subs.py

    You will get the final merged_output.srt file filling the missed bytes that were present in the extracted forien language. Use that .srt file.

    Do use SubSync for syncing. It's too good.

    I learned to sync them manually pretty fast
    Do tell me the process to do that manually

    EDIT: enclosed the .TXT file of python merge_subs.py. Convert into the .py file at your end. open in notepad and save it as all files with python merge_subs.py (name)
    Image Attached Files
    Last edited by sam12345; 29th Oct 2025 at 00:40.
    Quote Quote  
  27. EDIT:

    Don't forget to do the final touching with SE "Batch" Settings.xml enclosed herewith
    Image Attached Files
    Quote Quote  
  28. CMD > CLI working fine here but even it produced more accuracy than SE. Text that was missing from 00:06:30 - 00:08:00 in SE01 :EP01 appeared with CLI command. This is more accurate than SE

    faster-whisper-xxl.exe "Kabul - 1x01 - The Fall$_2.eng.aac" ^
    --task translate ^
    --word_timestamps True ^
    --model medium ^
    --device cpu ^
    --compute_type int8 ^
    --vad_method silero_v4 ^
    --beam_size 2 ^
    --best_of 1 ^
    --temperature 0 ^
    --threads 8 ^
    --output_dir "source"


    Image
    [Attachment 89433 - Click to enlarge]



    Click the python merge_subs.py script to produce the merged_output.srt and do the final touchup with SE batch using seetings.xml
    Last edited by sam12345; 29th Oct 2025 at 04:32.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!