Got it. BTW found the perfect way to do the subs with Purfview-Whisper-Faster XXL in SubtitleEdit.
1. Download ffmpeg and Purfview-Whisper-Faster XXL under video > Audio to text(whisper...)
2 In Advanced Parameters > --compute_type int8 --beam_size 2 --best_of 1 --temperature 0 --threads 8 --standard --beep_off
[Attachment 89384 - Click to enlarge]
Add your video file mkv/mp4 and Press Generate.
You can refine them via SE batch (settings.xml) attached herewith
+ Reply to Thread
Results 31 to 60 of 65
-
-
Nice, thanks.
SE whisper subs are nicely formatted, but they miss a lot of dialogs.
For example with Kabul S01-E01 that has only forced subs for foreign language, when I run this whisper model, srt file is 16895 bytes long, while Assembly one is 31317, almost double.
Whisper has 238 lines, Assembly has 462 ones.
Assembly is not free, but 456 free hours will last me a long time. And as we saw, it does not work with music, but I don't need that.
I also ran my stand-alone whisper with:
C:\Users\m1\AppData\Local\Programs\Python\Python31 1\Scripts\whisper.exe" "A:\proa\test\Kabul S01-E01.mkv" --model medium --task transcribe --word_timestamps True --device cuda
It's maybe 50 times slower than the one with SE, and it gave me a 719 line SRT file, but full of artifacts with the same sub repeated many times.
Overall, none of these subs feels natural.
I'd say that Assembly is the less bad, actually watchable.
I could not watch with the standalone whisper one when I tried before using Assembly, subs looked too bad. With SE whisper, half of the dialog is missing. -
[--large v2 --beam_size 5 --best_of 5]Whisper has 238 lines, Assembly has 462 ones
Try it ! > --large v2 --beam_size 5 --best_of 5 --task transcribe --word_timestamps True --temperature 0 --device cuda --standard --beep_off
if [--beam_size 5 --best_of 5] gives bad results shift to [--beam_size 2 --best_of 1]
Assembly need ACR correction and they give 350 hrs now instead of 456 hrsLast edited by sam12345; 26th Oct 2025 at 22:24.
-
install Faster-Whisper-XXL_r192.3.4_windows.7z form https://github.com/Purfview/whisper-standalone-win/releasesC:\Users\m1\AppData\Local\Programs\Python\Python31 1\Scripts\whisper.exe" "A:\proa\test\Kabul S01-E01.mkv" --model medium --task transcribe --word_timestamps True --device cuda
and do "faster-whisper-xxl.exe" from installation path > CMD or any path > CMD after adding the installation path in Environment Variables
Take the cuda libraries from here https://github.com/Purfview/whisper-standalone-win/releases/tag/libsLast edited by sam12345; 26th Oct 2025 at 22:21.
-
Thanks, I'll try all that.
I was having fun lately using Visual Studio 2022 to debug remotely a Linux project. VS2022 starts remotely gdb, and you get the nice UI in Windows to step through the code, see variables, set breakpoints, etc...
Do you do development on both Windows and Linux?
If you do, I'll share. -
I tested Faster-Whisper-XXL_r245.1. It's much faster and better than the whisper.exe I was using, thanks.
But it's still missing a lot of dialog.
Example with Assembly:
With whisper:Code:And that more and endless American military force could not create or sustain a durable Afghan government. I've concluded that it's time to end America's longest war.
This whisper is much better formatted, less subs for the same content grouping phrases, but it missed the beginning.Code:could not create or sustain a durable Afghan government. I've concluded that it's time to end America's longest war.
Edit: I'll test other models.Last edited by robena; 27th Oct 2025 at 03:34.
-
Use this command > CMD
faster-whisper-xxl.exe "My_Video.mkv" ^
--task transcribe ^
-l en ^
-m large-v2 ^
--device cuda ^
--word_timestamps True ^
--compute_type float16 ^
--vad_alt_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--max_line_count 2 ^
--max_line_width 32 ^
--output_dir "source"
Cahnge 'My_Video.mkv' with your video file; You can update the 'threads 8' as per your system cpu.
--max_line_count 2
--max_line_width 32
OR
--max_line_width 36
Will balnce the lines -
I tried the --diarize reverb_v2 model, supposed to be the most accurate. Still missing a lot of dialog.
Edit: I did not see your post above while posing this. I will try it. -
Does Assembly accepts these flags
--task transcribe ^
-l en ^
-m large-v2 ^
--device cuda ^
--word_timestamps True ^
--compute_type float16 ^
--vad_alt_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--max_line_count 2 ^
--max_line_width 32 ^
SE accepts all but not vad_method silero_v4 -
I tried your settings, thanks, still missing a lot of dialog.
But it's free and nicely formatted, so, usable if you have a lot of files to process.
No idea for Assembly, you could ask their support. -
Use version is r192.3.4 and this version does not accept --diarize reverb_v2I tested Faster-Whisper-XXL_r245.1. It's much faster and better than the whisper.exe I was using, thanks.
-
Try this > in Faster-Whisper-XXL_r245.4
faster-whisper-xxl.exe "My_Video.mkv" ^
--task transcribe ^
-l en ^
-m medium ^
--device cuda ^
--compute_type float16 ^
--vad_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--max_line_count 2 ^
--max_line_width 36 ^
--output_dir "source"
Kindly update the number of threads as per your system CPU. This version r245.4 accepts "vad_method silero_v4" and version r192.3 accepts "vad_alt_method silero_v4" -
I have a Threadripper 9960X, so I used 48 threads.
CPU load was between 10 and 35%. GPU at 30%. It runs FAST!
Whisper does better grouping, subs looks less AI generated than with Assembly.
But there are artifacts, same sub repeated, some subs have the wrong time-stamp being displayed 20 seconds to soon.
I though this might be generated by a language change. For this video, I have the forced subs for foreign dialog, and only need subs for the English one, to be merged then with SE.
So I used:
--model medium.en --language en --task transcribe --device cuda --compute_type float16 --vad_method silero_v4 --beam_size 1 --temperature 0 --threads 48 --initial_prompt "Spoken in English." --condition_on_previous_text False --max_line_count 2 --max_line_width 40 --output_dir "E:\Proe"
Most of the foreign part was ignored, but not all. Total number of subs is close to Assembly this time.
I did not check every subs, but the 20s errors after a language change at this particular location did not happen.
So, with multiple languages, use one pass for each and then merge using SE.
I'll watch an entire episode made with whisper. If I see bad things, I'll compare with Assembly and report.
And output anyway needs to be processed by SE to correct some usual errors, mostly timing.
Thanks! -
Edit: this fails on some files with:
Could not find codec parameters for stream 2 (Subtitle: hdmv_pgs_subtitle (pgssub)): unspecified size
Solution is to extract audio first:
ffmpeg.exe -analyzeduration 200M -probesize 200M -i "your_file.mkv" -vn -acodec pcm_s16le -ar 16000 -ac 1 "audio.wav"
and process audio.wav. -
try --beam_size 2 --best_of 1
Last edited by sam12345; 27th Oct 2025 at 23:17.
-
That was useful.
I am running now:
--model medium.en --language en --task transcribe --device cuda --compute_type float16 --vad_method silero_v5_fw --vad_threshold 0.4 --vad_min_speech_duration_ms 200 --vad_min_silence_duration_ms 300 --hallucination_silence_threshold 0.3 --no_speech_threshold 0.6 --logprob_threshold -2.0 --beam_size 3 --best_of 1 --temperature 0.1 --repetition_penalty 1.05 --no_repeat_ngram_size 3 --condition_on_previous_text False --language_detection_segments 5 --language_detection_threshold 0.85 --max_line_count 2 --max_line_width 40
I'll have to watch an entire episode, but it seems to have suppressed the hallucinations for foreign languages.
And as said before, subs look much more human made than Assembly ones, using this model was a very good suggestion.
I may try:
--model large-v3 --task translate --multilingual --device cuda --compute_type float16 --vad_method silero_v5_fw --vad_threshold 0.4 --beam_size 3 --best_of 1 --temperature 0.1 --repetition_penalty 1.05 --no_repeat_ngram_size 3 --condition_on_previous_text False --hallucination_silence_threshold 0.3 --no_speech_threshold 0.6 --logprob_threshold -2.0 --compression_ratio_threshold 1.2 --max_line_count 2 --max_line_width 40 --output_format srt
to translate everything, but since here I have human made subs for the foreign language, it's best to merge them.
Having fun trying all that here! -
--model large-v3 and --model large-v2 are less compatible with Faster Whisper XXL. Stick to model medium amd --temprature to 0. Rest all fine. Do report compared to Assemby AI.
Last edited by sam12345; 28th Oct 2025 at 08:57.
-
@robena
For ffmpeg tasks you can try Clever FFmpeg Gui. Its very nicely designed.
Try these subs > https://drive.google.com/drive/folders/1Mx_IHMIR452NP1104NS2mwsUtY3vS1AZ?usp=sharingFor example with Kabul S01-E01 that has only forced subs for foreign languageLast edited by sam12345; 28th Oct 2025 at 11:03.
-
Last edited by robena; 28th Oct 2025 at 12:07.
-
Try these > you won't regret it
https://drive.google.com/drive/folders/1vueh2NFWjSPRkKFhhRRoezuILVy_wBHI?usp=sharing -
These are complete, thanks.
Did you generate them yourself?
With what switches for whisper if you did?
I gave up on generating English subs only, that's not reliable, so I'm trying myself to translate every thing.
Your subs seem great, I'll have to watch fully to confirm. -
Yes I have done myself. Steps...
1. Extract the audio of all 6 episode with Clever FFmpge GUI
1A. Add all of them is SE
2. SE 4.0.14 - latest with model "medium" not "medium.en". Medium has all the languages.
3. with a single command in Se "Advanced" > --task translate --word_timestamps True --compute_type int8 --vad_method silero_v4 --beam_size 2 --best_of 1 --temperature 0 --threads 8 --standard --beep_off
[Attachment 89427 - Click to enlarge]
Finally "batch" with SE for final touchup. Settings.xml attached herewith -
Great find with SE, here is why it works better than my batch tries:
Inside SE, Putfview’s Whisper plug-in doesn’t feed the whole file to the binary once.
It slices the audio internally (usually 30–60 s chunks with ~1 s overlap) and calls Faster-Whisper-XXL repeatedly.
That prevents whisper missing the dialogs that it misses when it processes the whole file at once.
It's simple enough so that I don't need to re-invent the wheel and use that.
Just a tiny problem. When I use your exact settings, using the same SE version, the subs I get miss a small number of dialogs compared to yours.
Maybe because I have a different MKV version? Time stamps are not the same.
Anyway, all this is VERY useful. I look forward to test SubSync.
I have thousands of old series episodes with uppercase subs, and when getting proper lowercase ones, they don't sync.
I learned to sync them manually pretty fast, but an auto process will be great. -
Use a Python Script to overcome that. Steps..Just a tiny problem. When I use your exact settings, using the same SE version, the subs I get miss a small number of dialogs compared to yours.
Maybe because I have a different MKV version? Time stamps are not the same.
1. Get pythom 3.11.9 installed ..ignore if you already have
2. Exract the subs that have foreign language from the video file
3. Place the two subs (.srt) file's (A; made with Se) (B;extracted foreign language from video file) in a folder with (python merge_subs.py) script ..encl as below
4.Rename the srt files to...
complete_parts.srt (made with SE; having more bytes)
missing_parts.srt (extracted one with foreign language; having less bytes)
finally
run > cmd > from the path ( assumed;you have assigned the path at the time installation of python)
python merge_subs.py
You will get the final merged_output.srt file filling the missed bytes that were present in the extracted forien language. Use that .srt file.
Do use SubSync for syncing. It's too good.
Do tell me the process to do that manuallyI learned to sync them manually pretty fast
EDIT: enclosed the .TXT file of python merge_subs.py. Convert into the .py file at your end. open in notepad and save it as all files with python merge_subs.py (name)Last edited by sam12345; 29th Oct 2025 at 00:40.
-
EDIT:
Don't forget to do the final touching with SE "Batch" Settings.xml enclosed herewith -
CMD > CLI working fine here but even it produced more accuracy than SE. Text that was missing from 00:06:30 - 00:08:00 in SE01 :EP01 appeared with CLI command. This is more accurate than SE
faster-whisper-xxl.exe "Kabul - 1x01 - The Fall$_2.eng.aac" ^
--task translate ^
--word_timestamps True ^
--model medium ^
--device cpu ^
--compute_type int8 ^
--vad_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--output_dir "source"
[Attachment 89433 - Click to enlarge]
Click the python merge_subs.py script to produce the merged_output.srt and do the final touchup with SE batch using seetings.xmlLast edited by sam12345; 29th Oct 2025 at 04:32.
Similar Threads
-
Subtitle Edit - delete video and subtitle file after processing?
By svcds in forum SubtitleReplies: 0Last Post: 4th Jan 2024, 06:45 -
Subtitle edit - How to put 'A with a dash on top' in subtitle edit?
By SSEN in forum SubtitleReplies: 5Last Post: 21st Sep 2023, 21:57 -
Subtitle Edit : Capitalize Subtitle to Normal Subtitle incomplete
By kalemvar1 in forum SubtitleReplies: 6Last Post: 5th Aug 2023, 13:28 -
Subtitle Edit - Shortcut to set a subtitle minimum gap
By tren in forum SubtitleReplies: 2Last Post: 1st Aug 2023, 07:44 -
Subtitle edit, warning subtitle contains negative timing codes fix please
By jraju in forum Newbie / General discussionsReplies: 1Last Post: 16th Dec 2019, 19:52


Quote
