Subtitle Edit 4.0.3 and 3.6.13

26th Oct 2025 12:21 #31
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Got it. BTW found the perfect way to do the subs with Purfview-Whisper-Faster XXL in SubtitleEdit.

1. Download ffmpeg and Purfview-Whisper-Faster XXL under video > Audio to text(whisper...)

2 In Advanced Parameters > --compute_type int8 --beam_size 2 --best_of 1 --temperature 0 --threads 8 --standard --beep_off

[Attachment 89384 - Click to enlarge]

Add your video file mkv/mp4 and Press Generate.

You can refine them via SE batch (settings.xml) attached herewith

Attached Files

Settings.xml (99.6 KB, 3 views)
Quote
26th Oct 2025 14:34 #32
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Nice, thanks.

SE whisper subs are nicely formatted, but they miss a lot of dialogs.

For example with Kabul S01-E01 that has only forced subs for foreign language, when I run this whisper model, srt file is 16895 bytes long, while Assembly one is 31317, almost double.

Whisper has 238 lines, Assembly has 462 ones.

Assembly is not free, but 456 free hours will last me a long time. And as we saw, it does not work with music, but I don't need that.

I also ran my stand-alone whisper with:

C:\Users\m1\AppData\Local\Programs\Python\Python31 1\Scripts\whisper.exe" "A:\proa\test\Kabul S01-E01.mkv" --model medium --task transcribe --word_timestamps True --device cuda

It's maybe 50 times slower than the one with SE, and it gave me a 719 line SRT file, but full of artifacts with the same sub repeated many times.

Overall, none of these subs feels natural.

I'd say that Assembly is the less bad, actually watchable.

I could not watch with the standalone whisper one when I tried before using Assembly, subs looked too bad. With SE whisper, half of the dialog is missing.

Quote
26th Oct 2025 20:52 #33
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Whisper has 238 lines, Assembly has 462 ones

[--large v2 --beam_size 5 --best_of 5]

Try it ! > --large v2 --beam_size 5 --best_of 5 --task transcribe --word_timestamps True --temperature 0 --device cuda --standard --beep_off

if [--beam_size 5 --best_of 5] gives bad results shift to [--beam_size 2 --best_of 1]

Assembly need ACR correction and they give 350 hrs now instead of 456 hrs

Last edited by sam12345; 26th Oct 2025 at 22:24.

Quote
26th Oct 2025 21:39 #34
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
C:\Users\m1\AppData\Local\Programs\Python\Python31 1\Scripts\whisper.exe" "A:\proa\test\Kabul S01-E01.mkv" --model medium --task transcribe --word_timestamps True --device cuda

install Faster-Whisper-XXL_r192.3.4_windows.7z form https://github.com/Purfview/whisper-standalone-win/releases

and do "faster-whisper-xxl.exe" from installation path > CMD or any path > CMD after adding the installation path in Environment Variables

Take the cuda libraries from here https://github.com/Purfview/whisper-standalone-win/releases/tag/libs

Last edited by sam12345; 26th Oct 2025 at 22:21.

Quote
27th Oct 2025 01:04 #35
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Thanks, I'll try all that.

I was having fun lately using Visual Studio 2022 to debug remotely a Linux project. VS2022 starts remotely gdb, and you get the nice UI in Windows to step through the code, see variables, set breakpoints, etc...

Do you do development on both Windows and Linux?

If you do, I'll share.

Quote
27th Oct 2025 03:26 #36
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Windows only.

Quote
27th Oct 2025 03:29 #37
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
I tested Faster-Whisper-XXL_r245.1. It's much faster and better than the whisper.exe I was using, thanks.

But it's still missing a lot of dialog.

Example with Assembly:

Code:

And that more and endless American military force could not create or sustain a durable Afghan government. I've concluded that it's time to end America's longest war.

With whisper:

Code:

could not create or sustain a durable Afghan government. I've concluded that it's time to end America's longest war.

This whisper is much better formatted, less subs for the same content grouping phrases, but it missed the beginning.

Edit: I'll test other models.
Last edited by robena; 27th Oct 2025 at 03:34.
Quote
27th Oct 2025 03:47 #38
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Use this command > CMD

faster-whisper-xxl.exe "My_Video.mkv" ^
--task transcribe ^
-l en ^
-m large-v2 ^
--device cuda ^
--word_timestamps True ^
--compute_type float16 ^
--vad_alt_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--max_line_count 2 ^
--max_line_width 32 ^
--output_dir "source"

Cahnge 'My_Video.mkv' with your video file; You can update the 'threads 8' as per your system cpu.

--max_line_count 2
--max_line_width 32

OR

--max_line_width 36

Will balnce the lines

Quote
27th Oct 2025 03:48 #39
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
I tried the --diarize reverb_v2 model, supposed to be the most accurate. Still missing a lot of dialog.

Edit: I did not see your post above while posing this. I will try it.

Quote
27th Oct 2025 03:51 #40
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Does Assembly accepts these flags

--task transcribe ^
-l en ^
-m large-v2 ^
--device cuda ^
--word_timestamps True ^
--compute_type float16 ^
--vad_alt_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--max_line_count 2 ^
--max_line_width 32 ^

SE accepts all but not vad_method silero_v4

Quote
27th Oct 2025 04:12 #41
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
I tried your settings, thanks, still missing a lot of dialog.

But it's free and nicely formatted, so, usable if you have a lot of files to process.

No idea for Assembly, you could ask their support.

Quote
27th Oct 2025 04:20 #42
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
I tested Faster-Whisper-XXL_r245.1. It's much faster and better than the whisper.exe I was using, thanks.

Use version is r192.3.4 and this version does not accept --diarize reverb_v2

Quote
27th Oct 2025 05:33 #43
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
I will,and report if it's better on this particular clip.

Quote
27th Oct 2025 08:53 #44
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Try model "medium"

Last edited by sam12345; 27th Oct 2025 at 09:56.

Quote
27th Oct 2025 21:22 #45
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Try this > in Faster-Whisper-XXL_r245.4

faster-whisper-xxl.exe "My_Video.mkv" ^
--task transcribe ^
-l en ^
-m medium ^
--device cuda ^
--compute_type float16 ^
--vad_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--max_line_count 2 ^
--max_line_width 36 ^
--output_dir "source"

Kindly update the number of threads as per your system CPU. This version r245.4 accepts "vad_method silero_v4" and version r192.3 accepts "vad_alt_method silero_v4"

Quote
27th Oct 2025 22:41 #46
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
I have a Threadripper 9960X, so I used 48 threads.

CPU load was between 10 and 35%. GPU at 30%. It runs FAST!

Whisper does better grouping, subs looks less AI generated than with Assembly.

But there are artifacts, same sub repeated, some subs have the wrong time-stamp being displayed 20 seconds to soon.

I though this might be generated by a language change. For this video, I have the forced subs for foreign dialog, and only need subs for the English one, to be merged then with SE.

So I used:

--model medium.en --language en --task transcribe --device cuda --compute_type float16 --vad_method silero_v4 --beam_size 1 --temperature 0 --threads 48 --initial_prompt "Spoken in English." --condition_on_previous_text False --max_line_count 2 --max_line_width 40 --output_dir "E:\Proe"

Most of the foreign part was ignored, but not all. Total number of subs is close to Assembly this time.

I did not check every subs, but the 20s errors after a language change at this particular location did not happen.

So, with multiple languages, use one pass for each and then merge using SE.

I'll watch an entire episode made with whisper. If I see bad things, I'll compare with Assembly and report.

And output anyway needs to be processed by SE to correct some usual errors, mostly timing.

Thanks!

Quote
27th Oct 2025 22:57 #47
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Edit: this fails on some files with:

Could not find codec parameters for stream 2 (Subtitle: hdmv_pgs_subtitle (pgssub)): unspecified size

Solution is to extract audio first:

ffmpeg.exe -analyzeduration 200M -probesize 200M -i "your_file.mkv" -vn -acodec pcm_s16le -ar 16000 -ac 1 "audio.wav"

and process audio.wav.

Quote
27th Oct 2025 23:05 #48
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
try --beam_size 2 --best_of 1

Last edited by sam12345; 27th Oct 2025 at 23:17.

Quote
28th Oct 2025 06:22 #49
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
That was useful.

I am running now:

--model medium.en --language en --task transcribe --device cuda --compute_type float16 --vad_method silero_v5_fw --vad_threshold 0.4 --vad_min_speech_duration_ms 200 --vad_min_silence_duration_ms 300 --hallucination_silence_threshold 0.3 --no_speech_threshold 0.6 --logprob_threshold -2.0 --beam_size 3 --best_of 1 --temperature 0.1 --repetition_penalty 1.05 --no_repeat_ngram_size 3 --condition_on_previous_text False --language_detection_segments 5 --language_detection_threshold 0.85 --max_line_count 2 --max_line_width 40

I'll have to watch an entire episode, but it seems to have suppressed the hallucinations for foreign languages.

And as said before, subs look much more human made than Assembly ones, using this model was a very good suggestion.

I may try:

--model large-v3 --task translate --multilingual --device cuda --compute_type float16 --vad_method silero_v5_fw --vad_threshold 0.4 --beam_size 3 --best_of 1 --temperature 0.1 --repetition_penalty 1.05 --no_repeat_ngram_size 3 --condition_on_previous_text False --hallucination_silence_threshold 0.3 --no_speech_threshold 0.6 --logprob_threshold -2.0 --compression_ratio_threshold 1.2 --max_line_count 2 --max_line_width 40 --output_format srt

to translate everything, but since here I have human made subs for the foreign language, it's best to merge them.

Having fun trying all that here!

Quote
28th Oct 2025 06:51 #50
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
--model large-v3 and --model large-v2 are less compatible with Faster Whisper XXL. Stick to model medium amd --temprature to 0. Rest all fine. Do report compared to Assemby AI.

Last edited by sam12345; 28th Oct 2025 at 08:57.

Quote
28th Oct 2025 09:55 #51
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
@robena

For ffmpeg tasks you can try Clever FFmpeg Gui. Its very nicely designed.

For example with Kabul S01-E01 that has only forced subs for foreign language

Try these subs > https://drive.google.com/drive/folders/1Mx_IHMIR452NP1104NS2mwsUtY3vS1AZ?usp=sharing

Last edited by sam12345; 28th Oct 2025 at 11:03.

Quote
28th Oct 2025 11:45 #52
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Originally Posted by sam12345

Try these subs > https://drive.google.com/drive/folders/1Mx_IHMIR452NP1104NS2mwsUtY3vS1AZ?usp=sharing

Thanks, these are the foreign language only subs similar to the ones I have.

That's why I need using whisper to generate the English speaking ones.

Last edited by robena; 28th Oct 2025 at 12:07.

Quote
28th Oct 2025 21:06 #53
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Try these > you won't regret it

https://drive.google.com/drive/folders/1vueh2NFWjSPRkKFhhRRoezuILVy_wBHI?usp=sharing

Quote
28th Oct 2025 21:25 #54
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Originally Posted by sam12345

Try these > you won't regret it

https://drive.google.com/drive/folders/1vueh2NFWjSPRkKFhhRRoezuILVy_wBHI?usp=sharing

These are complete, thanks.

Did you generate them yourself?

With what switches for whisper if you did?

I gave up on generating English subs only, that's not reliable, so I'm trying myself to translate every thing.

Your subs seem great, I'll have to watch fully to confirm.

Quote
28th Oct 2025 21:50 #55
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Yes I have done myself. Steps...

1. Extract the audio of all 6 episode with Clever FFmpge GUI

1A. Add all of them is SE

2. SE 4.0.14 - latest with model "medium" not "medium.en". Medium has all the languages.

3. with a single command in Se "Advanced" > --task translate --word_timestamps True --compute_type int8 --vad_method silero_v4 --beam_size 2 --best_of 1 --temperature 0 --threads 8 --standard --beep_off

[Attachment 89427 - Click to enlarge]

Finally "batch" with SE for final touchup. Settings.xml attached herewith

Attached Files

Settings.xml (100.6 KB, 1 views)
Quote
28th Oct 2025 23:24 #56
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Use SubSync to sync the subs with video if they don't match

Quote
29th Oct 2025 00:04 #57
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Great find with SE, here is why it works better than my batch tries:

Inside SE, Putfview’s Whisper plug-in doesn’t feed the whole file to the binary once.
It slices the audio internally (usually 30–60 s chunks with ~1 s overlap) and calls Faster-Whisper-XXL repeatedly.

That prevents whisper missing the dialogs that it misses when it processes the whole file at once.

It's simple enough so that I don't need to re-invent the wheel and use that.

Just a tiny problem. When I use your exact settings, using the same SE version, the subs I get miss a small number of dialogs compared to yours.

Maybe because I have a different MKV version? Time stamps are not the same.

Anyway, all this is VERY useful. I look forward to test SubSync.

I have thousands of old series episodes with uppercase subs, and when getting proper lowercase ones, they don't sync.

I learned to sync them manually pretty fast, but an auto process will be great.

Quote
29th Oct 2025 00:30 #58
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Just a tiny problem. When I use your exact settings, using the same SE version, the subs I get miss a small number of dialogs compared to yours.

Maybe because I have a different MKV version? Time stamps are not the same.

Use a Python Script to overcome that. Steps..

1. Get pythom 3.11.9 installed ..ignore if you already have

2. Exract the subs that have foreign language from the video file

3. Place the two subs (.srt) file's (A; made with Se) (B;extracted foreign language from video file) in a folder with (python merge_subs.py) script ..encl as below

4.Rename the srt files to...

complete_parts.srt (made with SE; having more bytes)

missing_parts.srt (extracted one with foreign language; having less bytes)

finally

run > cmd > from the path ( assumed;you have assigned the path at the time installation of python)

python merge_subs.py

You will get the final merged_output.srt file filling the missed bytes that were present in the extracted forien language. Use that .srt file.

Do use SubSync for syncing. It's too good.

I learned to sync them manually pretty fast

Do tell me the process to do that manually

EDIT: enclosed the .TXT file of python merge_subs.py. Convert into the .py file at your end. open in notepad and save it as all files with python merge_subs.py (name)

Attached Files

python merge_subs.py.txt (600 Bytes, 3 views)
Last edited by sam12345; 29th Oct 2025 at 00:40.
Quote
29th Oct 2025 00:46 #59
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
EDIT:

Don't forget to do the final touching with SE "Batch" Settings.xml enclosed herewith

Attached Files

Settings.xml (97.8 KB, 0 views)
Quote
29th Oct 2025 01:50 #60
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
CMD > CLI working fine here but even it produced more accuracy than SE. Text that was missing from 00:06:30 - 00:08:00 in SE01 :EP01 appeared with CLI command. This is more accurate than SE

faster-whisper-xxl.exe "Kabul - 1x01 - The Fall$_2.eng.aac" ^
--task translate ^
--word_timestamps True ^
--model medium ^
--device cpu ^
--compute_type int8 ^
--vad_method silero_v4 ^
--beam_size 2 ^
--best_of 1 ^
--temperature 0 ^
--threads 8 ^
--output_dir "source"

[Attachment 89433 - Click to enlarge]

Click the python merge_subs.py script to produce the merged_output.srt and do the final touchup with SE batch using seetings.xml

Last edited by sam12345; 29th Oct 2025 at 04:32.

Quote

Subtitle Edit 4.0.3 and 3.6.13

Thread Tools

Search Thread

Similar Threads

Subtitle Edit - delete video and subtitle file after processing?

Subtitle edit - How to put 'A with a dash on top' in subtitle edit?

Subtitle Edit : Capitalize Subtitle to Normal Subtitle incomplete

Subtitle Edit - Shortcut to set a subtitle minimum gap

Subtitle edit, warning subtitle contains negative timing codes fix please