VideoHelp Forum




+ Reply to Thread
Results 1 to 8 of 8
  1. Hello,

    I use SubtitleEdit 4.0.13 (latest version to date) and I see that there are different profiles available for download regarding Whisper (Audio to Text).

    For example:
    -medium.en (1.42GB)
    -medium.en_q5_0 (539MB)
    -large-v3-turbo_q5_0 (547MB)

    I would like to know which one gives the best results in terms of recognition ? (process time does not matter for me).

    I also see that there is a choice between different "engines":
    -OpenAI
    -Purfview's Faster-Whisper-XXL
    -CPP (by default)
    -CPP cuBLAS
    -const-me
    -stable-ts
    -WhisperX
    and I don't know which one I should choose (as I don't know what it means).

    This is important because with CCP (selected by default), I get 2 times the same sentences (on some films) in the generated subtitle file... so this cost me a lot of work to correct the subtitle file

    Thank you
    Last edited by Nounours18200; 9th Sep 2025 at 08:47.
    Quote Quote  
  2. I run faster-whisper-xxl in stand alone mode. When I first started using it I ran with Large V3. Strange as it is that version gave me a message to the effect that I would get better results if I use Large V2. So that's all I use now.
    Quote Quote  
  3. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    The best and most accurate results is when you use the large model.
    BUT you will need a GPU of at least 12GB VRAM otherwise it will crash.
    I have a PC with 8GB VRAM and the best model I can use is the medium.
    The link below gives you an idea about the model vs, GPU VRAM
    See Available models and languages section.

    I don't use SubtitleEdit for transcription. I use OpenAI (command line) because it has more flexible options to try and get bet results.
    Tip: Always do transcription of the language and NOT translation to English, unless the audio is English.
    https://github.com/openai/whisper
    Quote Quote  
  4. for me faster-whisper-xxl and large_v3 model on a Nvidia RTX 2070 Super with 8GB VRAM works well.
    Quote Quote  
  5. OK I see that most of you use the Faster Whisper XXL engine: it is the one that I finally chose, just before opening this post. It worls well whereas CPP gave me a bullshit transcription with doubled lines...

    On my main PC no problem (it is very powerful), but I have faced a non-ending task when I launched this process on a Virtual Machine on my NAS.

    The RAM of the VM is OK (16Gb), but the GPU is more or less nothing on a NAS: so your remarks explain why the process never terminates. I suppose that I will have to use a smaller model.

    I still have the question regarding the difference between:
    -"medium" and "* medium.en" : they have the same size (1.5GB) so what is the difference ?
    -some models have the "distil" label at the beginning of their name: what does it mean ??

    Thank you very much,
    Quote Quote  
  6. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    The difference between models medium and medium.en is that medium.en is preferable to use when the audio is in the English language.
    You should try both and see which one is is more accurate.
    As for distil I have no idea what it means.
    Quote Quote  
  7. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    My thoughts. This is the batch file I call when using:
    https://github.com/Purfview/whisper-standalone-win/releases/tag/Faster-Whisper-XXL
    which-model.cmd
    Code:
    @echo off
    set which-model=1
    echo 1. large-v2 (Default) - the best overall. (slowest)
    echo 2. distil-large-v3.5 - 3-4 times the speed of large-v2. (95%-98% the accuracy of v2)
    echo 3. large-v3  - No
    echo 4. large-v3-turbo - No
    echo 5. medium.en - OK, --sentence parameter does not work too well.
    echo 6. small.en - almost as accurate as medium. --sentence parmeter works well.
    echo 7. tiny.en - fast, fair accuracy. Good for short videos that are to be discarded.
    
    set /p which-model=Which model? (1,2,3,4,5,6,7)
    if %which-model% equ 1 set model=large-v2
    if %which-model% equ 2 set model=distil-large-v3.5
    if %which-model% equ 3 set model=large-v3
    if %which-model% equ 4 set model=large-v3-turbo
    if %which-model% equ 5 set model=medium.en
    if %which-model% equ 6 set model=small.en
    if %which-model% equ 7 set model=tiny.en
    echo.
    I've put in comments to remind myself why I mostly use large-v2.
    Just my choices. ymmv.
    Quote Quote  
  8. Thank you very much to all of you : I have most of my answers !

    To date, it appears that the "Purfview's Whisper XXL" engine is the one that provide the best results by far, compared to the others.
    Particularly CPP (activated by default) generates double lines: almost impossible to use.

    My usage is mainly on old Black & White films, but CPP is bad also with more usual movies.

    Thank you all again.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!