Voice recognition with Whisper in SubtitleEdit

9th Sep 2025 08:39 #1
Nounours18200

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2005
Hello,

I use SubtitleEdit 4.0.13 (latest version to date) and I see that there are different profiles available for download regarding Whisper (Audio to Text).

For example:
-medium.en (1.42GB)
-medium.en_q5_0 (539MB)
-large-v3-turbo_q5_0 (547MB)

I would like to know which one gives the best results in terms of recognition ? (process time does not matter for me).

I also see that there is a choice between different "engines":
-OpenAI
-Purfview's Faster-Whisper-XXL
-CPP (by default)
-CPP cuBLAS
-const-me
-stable-ts
-WhisperX
and I don't know which one I should choose (as I don't know what it means).

This is important because with CCP (selected by default), I get 2 times the same sentences (on some films) in the generated subtitle file... so this cost me a lot of work to correct the subtitle file

Thank you

Last edited by Nounours18200; 9th Sep 2025 at 08:47.

Quote
9th Sep 2025 11:18 #2
Moralez

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2015

Location
USA
I run faster-whisper-xxl in stand alone mode. When I first started using it I ran with Large V3. Strange as it is that version gave me a message to the effect that I would get better results if I use Large V2. So that's all I use now.

Quote
9th Sep 2025 11:25 #3
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
The best and most accurate results is when you use the large model.
BUT you will need a GPU of at least 12GB VRAM otherwise it will crash.
I have a PC with 8GB VRAM and the best model I can use is the medium.
The link below gives you an idea about the model vs, GPU VRAM
See Available models and languages section.

I don't use SubtitleEdit for transcription. I use OpenAI (command line) because it has more flexible options to try and get bet results.
Tip: Always do transcription of the language and NOT translation to English, unless the audio is English.
https://github.com/openai/whisper

Quote
9th Sep 2025 13:38 #4
ChristianundCo

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2020

Location
Germany
for me faster-whisper-xxl and large_v3 model on a Nvidia RTX 2070 Super with 8GB VRAM works well.

Quote
10th Sep 2025 11:27 #5
Nounours18200

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2005
OK I see that most of you use the Faster Whisper XXL engine: it is the one that I finally chose, just before opening this post. It worls well whereas CPP gave me a bullshit transcription with doubled lines...

On my main PC no problem (it is very powerful), but I have faced a non-ending task when I launched this process on a Virtual Machine on my NAS.

The RAM of the VM is OK (16Gb), but the GPU is more or less nothing on a NAS: so your remarks explain why the process never terminates. I suppose that I will have to use a smaller model.

I still have the question regarding the difference between:
-"medium" and "* medium.en" : they have the same size (1.5GB) so what is the difference ?
-some models have the "distil" label at the beginning of their name: what does it mean ??

Thank you very much,

Quote
10th Sep 2025 11:45 #6
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
The difference between models medium and medium.en is that medium.en is preferable to use when the audio is in the English language.
You should try both and see which one is is more accurate.
As for distil I have no idea what it means.

Quote

10th Sep 2025 16:02 #7

Member

My thoughts. This is the batch file I call when using:
https://github.com/Purfview/whisper-standalone-win/releases/tag/Faster-Whisper-XXL
which-model.cmd

Code:

@echo off
set which-model=1
echo 1. large-v2 (Default) - the best overall. (slowest)
echo 2. distil-large-v3.5 - 3-4 times the speed of large-v2. (95%-98% the accuracy of v2)
echo 3. large-v3  - No
echo 4. large-v3-turbo - No
echo 5. medium.en - OK, --sentence parameter does not work too well.
echo 6. small.en - almost as accurate as medium. --sentence parmeter works well.
echo 7. tiny.en - fast, fair accuracy. Good for short videos that are to be discarded.

set /p which-model=Which model? (1,2,3,4,5,6,7)
if %which-model% equ 1 set model=large-v2
if %which-model% equ 2 set model=distil-large-v3.5
if %which-model% equ 3 set model=large-v3
if %which-model% equ 4 set model=large-v3-turbo
if %which-model% equ 5 set model=medium.en
if %which-model% equ 6 set model=small.en
if %which-model% equ 7 set model=tiny.en
echo.

I've put in comments to remind myself why I mostly use large-v2.

Just my choices. ymmv.

Quote

11th Sep 2025 05:35 #8
Nounours18200

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2005
Thank you very much to all of you : I have most of my answers !

To date, it appears that the "Purfview's Whisper XXL" engine is the one that provide the best results by far, compared to the others.
Particularly CPP (activated by default) generates double lines: almost impossible to use.

My usage is mainly on old Black & White films, but CPP is bad also with more usual movies.

Thank you all again.

Quote

Voice recognition with Whisper in SubtitleEdit

Thread Tools

Search Thread

Similar Threads

Subtitle Edit and Whisper

How I use whisper-faster on my machine

could some one recommend voice changer human voice to teddy voice

Unusual behavior in Subtitle Edit Whisper voice to music transitions

Voice recognition and transcription to text