How I use whisper-faster on my machine

18th Aug 2023 18:57 #1
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
This was all tested on Windows 10 x64.
I have added subtitles to approximately 20 videos using this method.

The brand of video card and the amount of memory on it is important.
For me, my nVidia card with 4 Gig of memory works with the medium model.
For the large model I have to revert to using the computer's cpu. (--device=cpu)

If your computer only has 4 Gig of memory, then whisper will probbly fail.
Test each option. The medium model using the computer's cpu should work on most newer computers

Download this and extract it to an empty folder. (e.g. d:\whisp\whisper-faster.exe)
https://github.com/Purfview/whisper-standalone-win/releases/download/faster-whisper/Wh...ter_r145.3.zip
Download this and extract it to the same folder.
https://github.com/Purfview/whisper-standalone-win/releases/download/libs/cuBLAS.and.cuDNN.7z
Add a copy of ffmpeg.exe to the same folder.
https://www.gyan.dev/ffmpeg/builds/ffmpeg-git-essentials.7z (x64 only)

If it DOES NOT EXIST - whisper-faster will download the model required, and place it in the correct sub-folder.

The default in the following batch file is for the tiny model. You just want to get it to work. Worry about larger models later.
Wait for the download of the model to finish. The transcription will start automatically after the download.

Code:

:1. :: WITH an 8Gig nVidia video card using onboard gpu. :: whisper-faster.exe "my movie.mkv" --language=English --model=large-v2 --output_format srt :2. :: WITHOUT an 8Gig nVidia video card. (runs on system memory. slower than GPU) :: whisper-faster.exe "my movie.mkv" --device=cpu --language=English --model=large-v2 --output_format srt :3. :: WITH an 4Gig nVidia video card. :: whisper-faster.exe "my movie.mkv" --language=English --model=medium --output_format srt :4. :: WITHOUT an 4Gig nVidia video card. (runs on system memory. slower than GPU) :: whisper-faster.exe "my movie.mkv" --device=cpu --language=English --model=medium --output_format srt :5. :: The tiny model to get things working. whisper-faster.exe "my movie.mkv" --language=English --model=tiny --output_format srt :6. :: The tiny model using the computer's cpu to get things working. Try this if the one above (:5.) fails. ::whisper-faster.exe "my movie.mkv" --device=cpu --language=English --model=tiny --output_format srt pause

I put the accuracy at 90-95 percent.
I feed each output srt file to Subtitle Edit.
Tools/Fix common errors
then
Tools/Break/Split long lines
then
Spell check.
The saved result is close enough for me. The Start timing marks can be off. I don't care.

The whole process, once I got it working, is excellent.
Cheers.
Last edited by pcspeak; 18th Aug 2023 at 19:44. Reason: Punctuation and clarity
Quote
20th Aug 2023 12:33 #2
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by pcspeak

For me, my nVidia card with 4 Gig of memory works with the medium model.

You can fit large model in 4GB VRAM with "-ct=int8".

Originally Posted by pcspeak

Add a copy of ffmpeg

It doesn't use ffmpeg in anyway.

Originally Posted by pcspeak

--output_format srt

You don't need to specify it because it's default setting.

Originally Posted by pcspeak

I put the accuracy at 90-95 percent.

You can check if "-bs=5" improves accuracy.

Last edited by VoodooFX; 20th Aug 2023 at 14:36.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
20th Aug 2023 15:50 #3
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
You can fit large model in 4GB VRAM with "-ct=int8".

I didn't know that. Thanks. Testing for speed is where I'm at a the moment.

It doesn't use ffmpeg in anyway.

Yeah, I know. I've been testing. This does.
https://superuser.com/questions/1778870/how-do-i-use-ffmpeg-and-openai-whisper-to-tran...-a-rtmp-stream
Earlier renditions of OpenAI's Whisper needed ffmpeg.
I'm a belt and braces type. Having ffmpeg in the folder or in my %Path% doesn't break anything.

--output_format srt

You don't need to specify it because it's default setting.

It's there because sometimes I change the format of the subtitles to vtt or txt. Post processing, it's mostly about how well Subtitle Edit handles the newly created subtitles. (Thanks Nik!) There can be minor differences and I'm just trying to get my head around which will be the most accurate for the creation of subs for all the episodes of Salvage Hunters I've recorded over the years. The Welsh names are interesting to deal with.

Originally Posted by pcspeak

You can check if "-bs=5" improves accuracy.

OK, I'll test that.

Cheers!

Quote
20th Aug 2023 16:21 #4
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by pcspeak

Earlier renditions of OpenAI's Whisper needed ffmpeg.

Latest needs too, but your post is about Faster-Whisper.

Btw, instead of "--output_format" you can use shorter alternative -> "-f".

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
20th Aug 2023 17:18 #5
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
@VoodooFX
On my machine (see my profile) with the nVidia 4 Gig card.
44 minute video.
Large model GPU did not work - out of memory
Medium model GPU - 10 minutes
Medium model CPU - 27 minutes

With you recommended parameters:
Large model GPU - 8.4 minutes (Winner!)

Code:

"D:\whisperf\whisper-faster.exe" "D:\a\my movie.mkv" --language=English --model=large-v2 -ct=int8 -bs=5 --output_dir "%%~dpa\" --output_format srt

Now to check the accuracy.
Cheers.
Quote
20th Aug 2023 19:41 #6
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by pcspeak

Code:

"D:\whisperf\whisper-faster.exe" "D:\a\my movie.mkv" --language=English --model=large-v2 -ct=int8 -bs=5 --output_dir "%%~dpa\" --output_format srt

Same command in short:

Code:

"D:\whisperf\whisper-faster.exe" "D:\a\my movie.mkv" -l=en -m=large-v2 -ct=int8 -bs=5 -o=source -f=srt
InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation
Quote
22nd Aug 2023 16:24 #7
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
@VoodooFX - Short or long names for parameters? I stopped using the shorter abbreviation, when there was a choice, some time ago.
When I checked whisper-faster.exe --help at a command prompt, neither of the shortened parameters showed.
You and I know what the abbreviated parameters mean. Other VideoHelp members may not.

I got arount to checking the transposition accuracy on my 44 minute video.
--compute_type int8 (-ct=int8) gives me an empty srt file. Tried many options but I could not get the large model to give me a srt file that was NOT empty. I'm staying with the medium.en model for now.

--beam_size 5 (-bs=5) Gives a definite improvement on accuracy, but took 50% longer to process. The video I'm using has Welsh, French and English words.
In the first 5 minutes of the video I found 4 occasions where the output srt, using --beam_size 5, was more accurate. beam_size 5 is staying in my batch file(s).

My next run will be on 5 videos in D:\a\ to get a more accurate reading of times taken, and just how good the transposition is.
This is the batch file I'm using:

Code:

@echo off for %%a in ("d:\a\*.mkv") do if exist "%%~dpna.srt" ( echo "%%a" - srt file already exists. ) else ( echo "%%a" && "D:\whisperf\whisper-faster.exe" "%%a" --language=English --model=medium.en --beam_size 5 --output_dir "%%~dpa\" --output_format srt ) echo All done. Press any key to Exit. &pause>nul

Unless I get asked a question this is my last post for this thread. I'm starting to bore myself.
Cheers.
Last edited by pcspeak; 22nd Aug 2023 at 16:32.
Quote
22nd Aug 2023 16:48 #8
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by pcspeak

When I checked whisper-faster.exe --help at a command prompt, neither of the shortened parameters showed.

It shows all, short and long ones.

Originally Posted by pcspeak

--compute_type int8 (-ct=int8) gives me an empty srt file

Strange, what you see in console at the end?

Originally Posted by pcspeak

--beam_size 5 (-bs=5) Gives a definite improvement on accuracy... In the first 5 minutes of the video I found 4 occasions

I tested whole movie [English] and counted all better/worse occasions, it was ~fifty-fifty with beam 5 vs 1, so, it didn't make subs more accurate when it made it slower.

Originally Posted by pcspeak

This is the batch file I'm using

Your batch doesn't do anything what you can't do with whisper-faster.exe alone.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote

22nd Aug 2023 18:43 #9

Member

It shows all, short and long ones.

My bad. You are correct. I've run --help on an earlier version of whisper-faster.exe by mistake.

Your batch doesn't do anything what you can't do with whisper-faster.exe alone.

You're right again. I'm just using copy/paste from other batch files. Old habits die hard.

Strange, what you see in console at the end?
Code:
D:\WhisperF>whisper-faster-Large-GPU v04.cmd

Standalone Faster-Whisper r145.3 running on: CUDA


Starting transcription on: d:\a\my movie.mkv


Transcription speed: 5.78 audio seconds/s

Subtitles are written to 'd:\a' directory.


Operation finished in: 499 seconds

  Press any key to Exit.
None of the usual time codes or text one expects.

The output folder.
Code:
D:\a>dir
  Volume in drive D is D
 Volume Serial Number is 00CE-8685

 Directory of D:\a

23/08/2023  09:34 AM    <DIR>          .
23/08/2023  09:34 AM    <DIR>          ..
11/04/2023  06:31 AM       223,042,765 my movie.mkv
23/08/2023  09:34 AM                 0 my movie.srt
               2 File(s)    223,042,765 bytes
               2 Dir(s)  72,641,765,376 bytes free

D:\a>
Cheers.

Quote

22nd Aug 2023 19:04 #10
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Run normally without cmd, and check what it writes with "--verbose=true, -f=all".

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote

22nd Aug 2023 20:19 #11

pcspeak

Member

I killed the transcription. Ctrl+C.

Code:

D:\WhisperF>whisper-faster.exe &quot;d:\a\my movie.mkv&quot; --language=English --model=large-v2 --compute_type=int8 --beam_size 5 --verbose=true --output_dir &quot;D:\a\&quot; --output_format all

Standalone Faster-Whisper r145.3 running on: CUDA

Number of visible GPU devices: 1

Supported compute types by GPU: {'int8', 'float16', 'int8_float32', 'float32', 'int8_float16'}

[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info] CPU: GenuineIntel (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - Selected ISA: AVX2
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - Use Intel MKL: true
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - SGEMM backend: MKL (packed: false)
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - GEMM_S16 backend: MKL (packed: false)
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - GEMM_S8 backend: MKL (packed: false, u8s8 preferred: true)
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info] GPU #0: NVIDIA GeForce GTX 1650 SUPER (CC=7.5)
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - Allow INT8: true
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - Allow FP16: true (with Tensor Cores: true)
[2023-08-23 10:40:33.256] [ctranslate2] [thread 2040] [info]  - Allow BF16: false
[2023-08-23 10:40:51.485] [ctranslate2] [thread 2040] [info] Using CUDA allocator: cuda_malloc_async
[2023-08-23 10:40:51.978] [ctranslate2] [thread 2040] [info] Loaded model D:\WhisperF\_models\faster-whisper-large-v2 on device cuda:0
[2023-08-23 10:40:51.978] [ctranslate2] [thread 2040] [info]  - Binary version: 6
[2023-08-23 10:40:51.979] [ctranslate2] [thread 2040] [info]  - Model specification revision: 3
[2023-08-23 10:40:51.979] [ctranslate2] [thread 2040] [info]  - Selected compute type: int8_float16

Model loaded in: 18.83 seconds

Starting transcription on: d:\a\my movie.mkv

Processing audio with duration 44:00.085

VAD filter removed 00:49.825 of audio
VAD filter kept the following audio segments: [00:00.000 -> 00:56.292], [00:58.716 -> 01:41.412], [01:43.836 -> 04:52.548], [04:54.204 -> 07:46.692], [07:48.156 -> 10:38.052], [10:40.380 -> 11:55.140], [11:56.796 -> 16:09.828], [16:13.020 -> 16:16.164], [16:19.644 -> 18:30.852], [18:32.604 -> 20:04.164], [20:05.724 -> 20:45.828], [20:49.404 -> 22:37.572], [22:40.188 -> 25:46.596], [25:48.252 -> 26:17.220], [26:22.428 -> 38:51.684], [38:54.876 -> 39:05.508], [39:08.316 -> 42:06.468], [42:08.220 -> 43:19.524], [43:22.428 -> 43:55.908]

Audio processing finished in: 22.68 seconds

Processing segment at 00:00.000
[2023-08-23 10:41:16.509] [ctranslate2] [thread 5376] [info] Loaded cuBLAS library version 11.8.1
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 00:29.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 00:58.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 01:27.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 01:56.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 02:25.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 02:54.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 03:23.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 03:52.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 04:21.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 04:50.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 05:19.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 05:48.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 06:17.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 06:46.000
* Compression ratio threshold is not met with temperature 0.0 (5.535714 > 2.400000)
Processing segment at 07:15.000
Traceback (most recent call last):
  File &quot;D:\whisper-fast\__main__.py&quot;, line 657, in <module>
  File &quot;D:\whisper-fast\__main__.py&quot;, line 605, in cli
  File &quot;faster_whisper\transcribe.py&quot;, line 931, in restore_speech_timestamps
  File &quot;faster_whisper\transcribe.py&quot;, line 415, in generate_segments
  File &quot;faster_whisper\transcribe.py&quot;, line 651, in generate_with_fallback
KeyboardInterrupt
[5388] Failed to execute script '__main__' due to unhandled exception!

 D:\WhisperF>

With --beam_size 5 removed.

Code:

Processing segment at 00:00.000
[2023-08-23 11:12:07.354] [ctranslate2] [thread 10732] [info] Loaded cuBLAS library version 11.8.1
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 00:30.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 01:00.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 01:30.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 02:00.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 02:30.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 03:00.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 03:30.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 04:00.000
* Log probability threshold is not met with temperature 0.0 (-1.928111 < -1.000000)
* No speech threshold is met (0.633789 > 0.600000)
Processing segment at 04:30.000
Traceback (most recent call last):
  File "D:\whisper-fast\__main__.py", line 657, in <module>
  File "D:\whisper-fast\__main__.py", line 605, in cli
  File "faster_whisper\transcribe.py", line 931, in restore_speech_timestamps
  File "faster_whisper\transcribe.py", line 408, in generate_segments
  File "faster_whisper\transcribe.py", line 620, in encode
KeyboardInterrupt
 [10968] Failed to execute script '__main__' due to unhandled exception!

I'm now officially out of my comfort zone. But most happy to learn.
Cheers.

Quote

22nd Aug 2023 20:22 #12
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Try with "int8_float32".

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote

22nd Aug 2023 20:56 #13

pcspeak

Member

Congratulations! That's working. I'll run it through to the end and get back to you.

Code:

Starting transcription on: d:\a\my movie.mkv

Processing audio with duration 44:00.085

VAD filter removed 00:49.825 of audio
VAD filter kept the following audio segments: [00:00.000 -> 00:56.292], [00:58.716 -> 01:41.412], [01:43.836 -> 04:52.548], [04:54.204 -> 07:46.692], [07:48.156 -> 10:38.052], [10:40.380 -> 11:55.140], [11:56.796 -> 16:09.828], [16:13.020 -> 16:16.164], [16:19.644 -> 18:30.852], [18:32.604 -> 20:04.164], [20:05.724 -> 20:45.828], [20:49.404 -> 22:37.572], [22:40.188 -> 25:46.596], [25:48.252 -> 26:17.220], [26:22.428 -> 38:51.684], [38:54.876 -> 39:05.508], [39:08.316 -> 42:06.468], [42:08.220 -> 43:19.524], [43:22.428 -> 43:55.908]

Audio processing finished in: 22.6 seconds

Processing segment at 00:00.000
[2023-08-23 11:48:06.036] [ctranslate2] [thread 5980] [info] Loaded cuBLAS library version 11.8.1
[00:00.000 --> 00:08.880]  On salvage hunters best buys. Cheers. Drew looks back at his all-time favorite buys from Northern Europe. Oh
[00:08.880 --> 00:09.960]  My god, look
[00:11.520 --> 00:18.220]  Searching for rare continental pieces in France. He plays on the language barrier to get a steal of a deal
Processing segment at 00:18.220
[00:19.160 --> 00:20.700]  Wait, thank you
[00:27.400 --> 00:33.400]  In Belgium as an exclusive antiques Emporium he's spoiled for choice at a jaw-dropping collection
[00:34.620 --> 00:38.060]  Wow the death mask Napoleon is it Wow
[00:39.420 --> 00:43.380]  In Amsterdam, he's enchanted by an ancient Greek poetess
[00:43.980 --> 00:47.460]  This is stunning. Look at that. What a thing
Processing segment at 00:48.220
[00:48.220 --> 00:54.680]  These are Drew's favorite Northern European hunting grounds. Oh, yeah. Now you're talking about that. Yeah a grand bazaar
[00:59.080 --> 01:02.540]  Drew Pritchard is one of Britain's leading decorative salvage dealers
[01:03.080 --> 01:07.360]  Stop here for quality and fun in his hunt for weird and wonderful objects
[01:08.200 --> 01:11.120]  What's a fabulous thing? That's something I've never seen before
[01:11.680 --> 01:14.000]  He scoured the country and the continent
Processing segment at 01:11.580
[01:14.520 --> 01:21.480]  Merci, merci, that's got on salvage hunters best buys. He takes us inside his most remarkable deals
[01:22.080 --> 01:22.960]  1550 what I do
[01:23.700 --> 01:25.260]  He's not that's cracking
[01:26.020 --> 01:28.820]  revealing his favorite purchases Wow
[01:28.820 --> 01:36.340]  Seriously impressive places. Oh my word. I just don't know what to look first and people you can buy one piece out there
[01:36.340 --> 01:38.360]  But I'm gonna charge a lot of money for it
Processing segment at 01:35.940
[01:44.150 --> 01:44.750]  Oh
[01:44.750 --> 01:47.670]  Where's 30-year career in the antique and salvage trade
[01:48.350 --> 01:50.010]  450 euros. Yes, so
[01:51.210 --> 01:56.290]  That's nice drew has traveled far and wide across the continent. Hello
[01:57.110 --> 01:57.490]  incredible
[01:58.650 --> 02:04.910]  To bring the best European decorative antiques back to the UK. I try not to hit the Octotrion
[02:06.590 --> 02:09.510]  Yeah, let's see 800
Processing segment at 02:04.660
[02:10.310 --> 02:10.930]  Let's see
[02:11.530 --> 02:17.110]  Traveling around Europe buying antiques. Yes. It is as good as it sounds it really is
[02:17.110 --> 02:19.930]  For me going there. It's like a melting pot
[02:19.930 --> 02:24.290]  I'm really never know what I'm gonna find you can find some really exceptional things if you look around
[02:24.290 --> 02:27.750]  All of the things in Belgium all in one place Wow
[02:28.830 --> 02:32.670]  But there's a special place in Drew's heart for the northern part of the continent
[02:32.670 --> 02:37.250]  It was one of the major trade routes of the 18th and 19th century
Processing segment at 02:32.400
Traceback (most recent call last):
  File "D:\whisper-fast\__main__.py", line 657, in <module>
  File "D:\whisper-fast\__main__.py", line 605, in cli
  File "faster_whisper\transcribe.py", line 931, in restore_speech_timestamps
  File "faster_whisper\transcribe.py", line 408, in generate_segments
  File "faster_whisper\transcribe.py", line 620, in encode
KeyboardInterrupt
[2332] Failed to execute script '__main__' due to unhandled exception!

Cheers.

Quote

22nd Aug 2023 20:58 #14

pcspeak

Member

It crashed.

Code:

[03:17.330 --> 03:20.710]  Came back did really well and I've been coming ever since
[03:21.870 --> 03:23.330]  in La Belle France
[03:23.330 --> 03:30.430]  Drew and T scoured the countryside for some typically galaxy in one of the country's many secondhand shops or brocons
[03:32.850 --> 03:35.310]  The French have an incredible sort of
Processing segment at 03:30.460
[03:35.310 --> 03:41.030]  Love of brocons, it's commonplace, but it's in their culture
[03:41.030 --> 03:46.610]  So you get to go to great shops and brocons all over the country
[03:46.610 --> 03:50.530]  They're everywhere and they're full generally rammed and I like them that way
[03:52.950 --> 04:00.890]  On the outskirts of Rouen in Normandy Drew visited a huge three-story barn stuffed with eclectic beautiful and curious items
[04:00.890 --> 04:03.510]  many sourced from local chateau and farms
Processing segment at 04:00.460
[04:05.810 --> 04:13.210]  The authentic brocon shop experience they encountered is the result of owner Max Teterland's passion for French provincial antiques
[04:14.730 --> 04:18.870]  Max has been a dealer I think 40 years today. I'm looking for
[04:19.590 --> 04:23.310]  Garden stuff predominantly and then one sort of rustic
[04:24.050 --> 04:27.810]  Steel work, you know all that sort of stuff you can pick up around here. Yeah
[04:28.510 --> 04:32.090]  She did my rail a la brocon the board. Yeah
Processing segment at 04:27.240
* Log probability threshold is not met with temperature 0.0 (-1.334731 < -1.000000)
Traceback (most recent call last):
  File "D:\whisper-fast\__main__.py", line 657, in <module>
  File "D:\whisper-fast\__main__.py", line 605, in cli
  File "faster_whisper\transcribe.py", line 931, in restore_speech_timestamps
  File "faster_whisper\transcribe.py", line 415, in generate_segments
  File "faster_whisper\transcribe.py", line 651, in generate_with_fallback
RuntimeError: CUDA failed with error out of memory
[6356] Failed to execute script '__main__' due to unhandled exception!

Quote

22nd Aug 2023 21:01 #15
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Close all programs using GPU, that includes an internet browser, maybe restart PC. If that doesn't help, set beam to 1.

Btw, can you share this audio [remuxed with mkvtoolnix without video]?

Last edited by VoodooFX; 22nd Aug 2023 at 21:15.

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
22nd Aug 2023 21:13 #16
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Actually, it gets out of memory not because of higher beam, but because of fallback when "--best_of" is at work, it's 5 by default, you can try to lower it.
If it still gets out of memory then you can disable fallback -> " --temperature_increment_on_fallback=None".

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
22nd Aug 2023 22:39 #17
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
I sent you a pm.

--temperature_increment_on_fallback=None
That worked. The error rate is high compared to the medium.en model using the following:

Code:

whisper-faster.exe "d:\a\*.mkv" --language=English --model=medium.en --compute_type=int8_float32 --beam_size 5 --output_dir D:\a\ --output_format srt

And much slower.
Quote
23rd Aug 2023 14:09 #18
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Is "--model=medium.en --compute_type=int8_float16" working normally on that file?

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote

23rd Aug 2023 15:03 #19

pcspeak

Member

Yes, it does work.
With nothing else changed What's interesting is the punctuation.
float32 gives me a period at the end of each sentence.

--model=medium.en --compute_type=int8_float16

Code:

Processing segment at 18:27.040
[18:47.420 --> 18:48.280]  No
[18:50.040 --> 18:51.240]  Everything has its price
[18:51.240 --> 18:51.920]  I know
[18:51.920 --> 18:52.980]  Yes you're very right
[18:52.980 --> 18:54.320]  You're very right
[18:55.620 --> 18:56.100]  God
[18:56.580 --> 18:57.540]  Oh what the heck
[18:58.340 --> 18:59.020]  Thank you
[18:59.020 --> 19:00.420]  This is a really good piece
[19:00.420 --> 19:01.080]  It is
[19:01.080 --> 19:01.960]  Thank you so much
[19:01.960 --> 19:02.900]  We appreciate it
[19:03.440 --> 19:04.080]  Appreciate it
[19:04.080 --> 19:05.200]  Wonderful thing
[19:05.200 --> 19:07.120]  One of the nicest things I've ever bought
[19:07.120 --> 19:08.840]  One of the nicest things I've ever bought
[19:08.840 --> 19:09.260]  Wonderful
[19:09.260 --> 19:11.300]  Honestly it's one of the best things I've ever bought
[19:11.300 --> 19:15.400]  It had that certain magic to it
Processing segment at 18:55.020

--model=medium.en --compute_type=int8_float32

Code:

Processing segment at 18:38.540
[18:58.920 --> 19:00.600]  This is a really good piece, sir.
[19:00.700 --> 19:01.040]  It is.
[19:01.240 --> 19:01.940]  Thank you so much.
[19:02.200 --> 19:02.860]  We appreciate it.
[19:03.520 --> 19:04.080]  Appreciate it.
[19:04.240 --> 19:04.880]  Wonderful thing.
[19:05.740 --> 19:07.100]  One of the nicest things I've ever bought.
[19:07.600 --> 19:08.800]  One of the nicest things I've ever bought.
[19:08.960 --> 19:09.260]  Wonderful.
[19:09.600 --> 19:11.280]  Honestly, it's one of the best things I've ever bought.
[19:12.080 --> 19:15.240]  It had that certain magic to it.
[19:15.540 --> 19:18.340]  That single piece makes all of those hours
[19:18.340 --> 19:20.480]  and all of that travel worth it.
[19:21.040 --> 19:23.500]  And I was particularly pleased with the deal I did on it.
[19:23.720 --> 19:25.760]  I did give the dealer a little bit of a kicking.
[19:26.560 --> 19:27.480]  He still made a profit.
[19:27.760 --> 19:28.680]  I got what I wanted.
Processing segment at 19:08.300

Quote

23rd Aug 2023 16:24 #20
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
OK.
Could you test "r143" and "r145" with "--model=large-v2 --compute_type=int8"?

There I uploaded these versions: https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
23rd Aug 2023 20:40 #21
pcspeak

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2007

Location
Australia
The output

Attached Files

whisper-faster_r143-console.txt (3.4 KB, 171 views)

whisper-faster_r145-console.txt (4.5 KB, 122 views)
Quote
27th Aug 2023 17:58 #22
loninappleton

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2005

Location
USA
I think pcspeak was going to close the thread but...

Just a word of thanks for the two main coders (so far as I know) in this sub forum continuing improvement on this.

Yes I would have to upgrade a GPU but there was mention of NVidia vs Radeon. I'm a way off from adding cards before I see a proven need for the upgrade.

onward.

Quote
15th Sep 2023 15:27 #23
Ejdehan

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2022
How does int8_float32 on CPU differ from int8 or float32?

Quote
15th Sep 2023 15:38 #24
VoodooFX

View Profile

View Forum Posts

Private Message
Video Damager

Join Date
Oct 2021

Location
At Doom9
Originally Posted by Ejdehan

How does int8_float32 on CPU differ from int8 or float32?

Old "int8" was same as "int8_float32", now "int8" is the auto selection from three "int8_..." variations.
"float32" requires ~twice more memory than "int8_..." and model loading is much faster with "float32".

Btw, on my CPU transcription with "float32" is ~twice faster than with "int8_float32".

There is documentation about different quantizations -> https://opennmt.net/CTranslate2/quantization.html

InpaintDelogo - advanced logo removal & hardcoded subtitles extraction
Standalone Faster-Whisper - Portable AI auto-transcription-translation

Quote
30th Oct 2023 12:25 #25
conchimnon

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2023
Hello, I'm using a computer with an i5 9400 CPU and an RX6600 GPU.

So, when using Whisper-Faster, how should I set it up for the best performance?

Thank you, everyone.

Quote

How I use whisper-faster on my machine

Thread Tools

Search Thread

Similar Threads

Standalone Faster-Whisper - Portable AI auto-transcription-translation

Whisper engines in Subtitle Edit

Speech Model updates for VOSK or Whisper

A guide to generating subtitles through Whisper AI

Subtitle Edit 3.6.10 new version with Whisper option