Hello,
I use SubtitleEdit 4.0.13 (latest version to date) and I see that there are different profiles available for download regarding Whisper (Audio to Text).
For example:
-medium.en (1.42GB)
-medium.en_q5_0 (539MB)
-large-v3-turbo_q5_0 (547MB)
I would like to know which one gives the best results in terms of recognition ? (process time does not matter for me).
I also see that there is a choice between different "engines":
-OpenAI
-Purfview's Faster-Whisper-XXL
-CPP (by default)
-CPP cuBLAS
-const-me
-stable-ts
-WhisperX
and I don't know which one I should choose (as I don't know what it means).
This is important because with CCP (selected by default), I get 2 times the same sentences (on some films) in the generated subtitle file... so this cost me a lot of work to correct the subtitle file
Thank you
+ Reply to Thread
Results 1 to 8 of 8
-
Last edited by Nounours18200; 9th Sep 2025 at 08:47.
-
I run faster-whisper-xxl in stand alone mode. When I first started using it I ran with Large V3. Strange as it is that version gave me a message to the effect that I would get better results if I use Large V2. So that's all I use now.
-
The best and most accurate results is when you use the large model.
BUT you will need a GPU of at least 12GB VRAM otherwise it will crash.
I have a PC with 8GB VRAM and the best model I can use is the medium.
The link below gives you an idea about the model vs, GPU VRAM
See Available models and languages section.
I don't use SubtitleEdit for transcription. I use OpenAI (command line) because it has more flexible options to try and get bet results.
Tip: Always do transcription of the language and NOT translation to English, unless the audio is English.
https://github.com/openai/whisper -
for me faster-whisper-xxl and large_v3 model on a Nvidia RTX 2070 Super with 8GB VRAM works well.
-
OK I see that most of you use the Faster Whisper XXL engine: it is the one that I finally chose, just before opening this post. It worls well whereas CPP gave me a bullshit transcription with doubled lines...
On my main PC no problem (it is very powerful), but I have faced a non-ending task when I launched this process on a Virtual Machine on my NAS.
The RAM of the VM is OK (16Gb), but the GPU is more or less nothing on a NAS: so your remarks explain why the process never terminates. I suppose that I will have to use a smaller model.
I still have the question regarding the difference between:
-"medium" and "* medium.en" : they have the same size (1.5GB) so what is the difference ?
-some models have the "distil" label at the beginning of their name: what does it mean ??
Thank you very much, -
The difference between models medium and medium.en is that medium.en is preferable to use when the audio is in the English language.
You should try both and see which one is is more accurate.
As for distil I have no idea what it means. -
My thoughts. This is the batch file I call when using:
https://github.com/Purfview/whisper-standalone-win/releases/tag/Faster-Whisper-XXL
which-model.cmdCode:@echo off set which-model=1 echo 1. large-v2 (Default) - the best overall. (slowest) echo 2. distil-large-v3.5 - 3-4 times the speed of large-v2. (95%-98% the accuracy of v2) echo 3. large-v3 - No echo 4. large-v3-turbo - No echo 5. medium.en - OK, --sentence parameter does not work too well. echo 6. small.en - almost as accurate as medium. --sentence parmeter works well. echo 7. tiny.en - fast, fair accuracy. Good for short videos that are to be discarded. set /p which-model=Which model? (1,2,3,4,5,6,7) if %which-model% equ 1 set model=large-v2 if %which-model% equ 2 set model=distil-large-v3.5 if %which-model% equ 3 set model=large-v3 if %which-model% equ 4 set model=large-v3-turbo if %which-model% equ 5 set model=medium.en if %which-model% equ 6 set model=small.en if %which-model% equ 7 set model=tiny.en echo.
Just my choices. ymmv. -
Thank you very much to all of you : I have most of my answers !
To date, it appears that the "Purfview's Whisper XXL" engine is the one that provide the best results by far, compared to the others.
Particularly CPP (activated by default) generates double lines: almost impossible to use.
My usage is mainly on old Black & White films, but CPP is bad also with more usual movies.
Thank you all again.
Similar Threads
-
Subtitle Edit and Whisper
By koberulz in forum SubtitleReplies: 23Last Post: 13th Jan 2025, 02:26 -
How I use whisper-faster on my machine
By pcspeak in forum SubtitleReplies: 24Last Post: 30th Oct 2023, 12:25 -
could some one recommend voice changer human voice to teddy voice
By jraju in forum Newbie / General discussionsReplies: 0Last Post: 6th Aug 2023, 07:06 -
Unusual behavior in Subtitle Edit Whisper voice to music transitions
By loninappleton in forum SubtitleReplies: 8Last Post: 6th Jul 2023, 02:51 -
Voice recognition and transcription to text
By JosephTocco in forum Newbie / General discussionsReplies: 8Last Post: 27th Jul 2021, 13:42