Steps & Settings for use of Whisper in Subtitle Edit ?

Thread

18th Sep 2023 00:51 #31
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
I just basically stumbled my way through the menus until I found a basic work process for this that seems to have proven successful for me. -- at least so far. But for anyone else who may be interested in more formal, detailed guides, I did find this online:

https://www.notta.ai/en/blog/how-to-use-whisper

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
20th Sep 2023 05:25 #32
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by Seeker47

I just basically stumbled my way through the menus until I found a basic work process for this that seems to have proven successful for me. -- at least so far. But for anyone else who may be interested in more formal, detailed guides, I did find this online:

https://www.notta.ai/en/blog/how-to-use-whisper

It is a good tutorial. Thanks for sharing.
I have been using Whisper AI for about a year and a GPU speeds things up a lot.
If you want to try Whisper AI with GPU anyway, and you have a Google Drive working, try using Google Colaboratory (Jupyter). You will need to install Whisper AI again but you have done it already on your PC so things will be easy for you. You will be able to use free GPU up to a certain size. Try it to transcribe a short video starting with the smallest models and move upwards till medium or large.
I haven't used it because my Google Drive is corrupted. Let us know if it works for you.
You might find this video helpful
https://www.youtube.com/watch?v=wrSelk44_Js&ab_channel=MathsChelsea

Last edited by Subtitles; 20th Sep 2023 at 05:31.

Quote
20th Sep 2023 12:16 #33
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Originally Posted by Subtitles

Originally Posted by Seeker47

I just basically stumbled my way through the menus until I found a basic work process for this that seems to have proven successful for me. -- at least so far. But for anyone else who may be interested in more formal, detailed guides, I did find this online:

https://www.notta.ai/en/blog/how-to-use-whisper

It is a good tutorial. Thanks for sharing.
I have been using Whisper AI for about a year and a GPU speeds things up a lot.
If you want to try Whisper AI with GPU anyway, and you have a Google Drive working, try using Google Colaboratory (Jupyter). You will need to install Whisper AI again but you have done it already on your PC so things will be easy for you. You will be able to use free GPU up to a certain size. Try it to transcribe a short video starting with the smallest models and move upwards till medium or large.
I haven't used it because my Google Drive is corrupted. Let us know if it works for you.
You might find this video helpful
https://www.youtube.com/watch?v=wrSelk44_Js&ab_channel=MathsChelsea

Thanks for the suggestion. I've ordered a video card that has 4 GB of VRAM (probably not enough . . . ?), which might fit and work in this computer setup. So I expect to be giving that a try. I've only seen mentions of Google Drive, no exposure to that at all.

Right now, on CPU only, I have a Whisper job that's been running for 10 hours -- so far -- yet the Whisper log has only reached about 400 bytes in size. Should I take that as an indication that Whisper has stalled and given up ? Does the log only get written to at the very end ? Any previous job here of about the same size was completed overnight.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
20th Sep 2023 12:32 #34
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Are you using Subtitle Edit for this job or the Whisper AI command line?
Which model have you selected to do the transcription?
You can try the simplest model and see how it goes and go up higher once it finishes.
model medium and large are not going to work on your PC. You will see the difference when you install the GPU.

Quote
20th Sep 2023 13:50 #35
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Originally Posted by Subtitles

Are you using Subtitle Edit for this job or the Whisper AI command line?
Which model have you selected to do the transcription?
You can try the simplest model and see how it goes and go up higher once it finishes.
model medium and large are not going to work on your PC. You will see the difference when you install the GPU.

It just seems to be spinning its wheels on this job. Nothing has shown up yet in the upper left processing window, which I don't think was the case for the previous jobs.

Doing this under Subtitle Edit, which then hands off to Whisper. Using the Large model, as was the case for the previous 6 jobs -- all approx. the same size, and either with original French or German language. (No, wait -- there was one that was Japanese.) All of those succeeded, taking around 8 hours ea. to complete. Results ranged from satisfactory to quite good. I wasn't seeing any clear reason to deviate from this template that had worked several times, the only variable being the spoken language. But all of those had pretty clean soundtracks, with good recording and nothing much to interfere with that. I had not previewed the sound for this one so I don't know, but will go back to see where it stands.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
20th Sep 2023 14:34 #36
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
After 12 hours and zilch to show for it, I pulled the plug on that job. May try it again after that video card arrives in a few days. Based on a quick check, I noted no obvious defects in the audio -- no people talking over each other, or poor recording, or obscuring background sound. So that was officially a first failure, once I'd worked out the rudiments of getting Whisper AI going.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
20th Sep 2023 14:41 #37
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by Seeker47

After 12 hours and zilch to show for it, I pulled the plug on that job. May try it again after that video card arrives in a few days. Based on a quick check, I noted no obvious defects in the audio -- no people talking over each other, or poor recording, or obscuring background sound. So that was officially a first failure, once I'd worked out the rudiments of getting Whisper AI going.

Start again and use model small. At least you will get somthing even if it is not very accurate.

Quote
21st Sep 2023 11:51 #38
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Originally Posted by Subtitles

Originally Posted by Seeker47

After 12 hours and zilch to show for it, I pulled the plug on that job. May try it again after that video card arrives in a few days. Based on a quick check, I noted no obvious defects in the audio -- no people talking over each other, or poor recording, or obscuring background sound. So that was officially a first failure, once I'd worked out the rudiments of getting Whisper AI going.

Start again and use model small. At least you will get somthing even if it is not very accurate.

I'll probably defer on this until that video card comes in -- enough other things going on in the interim anyway.

For the sake of comparison, has anyone experimented with Deepl ? If so, was it any good for the translating ? I had their Win standalone app installed, but only went to check it out for the first time a few days ago. At first the app could not be found (it had installed itself far under Users in C:, normally a place I would never install anything), and then when I tried to run it from that location it promptly uninstalled itself. Always possible that I made some mistake . . . . Seemingly a dead end, nonetheless.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
21st Sep 2023 12:00 #39
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
@Subtitles,

Have you seen that bar graph which shows all the languages, and relatively how well Whisper performed in translating them ?
If not I'm sure I can find the link and post it here.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
21st Sep 2023 12:24 #40
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by Seeker47

@Subtitles,

Have you seen that bar graph which shows all the languages, and relatively how well Whisper performed in translating them ?
If not I'm sure I can find the link and post it here.

Link
https://github.com/openai/whisper

I prefer to use the term trascription and not translation simply because I can check the final job while listening even in different languages.
For translation there are several options, including running Whisper again.

Last edited by Subtitles; 22nd Sep 2023 at 04:42.

Quote
1st Oct 2023 00:12 #41
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Well, I tried again, this time on an otherwise identical computer, but which has double the system memory (32 GB. -- and that maxes it out -- vs. 16), plus an MSI Video card that provides the GPU, along with 4 GB. of VRAM. And I'm baffled at how I'm now hitting a brick wall of greatly inferior results. The error messages make no sense to me at all. So I hope they might be revealing to someone else here.

-------------------------------------------------------------------------------------------------------------------

Job Failure 1:

SE: 4.0.0.0 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Message: Calling whisper (Purfview's Faster-Whisper) with : C:\Portable SW\Subtitle Edit 4\Whisper\Purfview-Whisper-Faster\whisper-faster.exe --language fr --model "large-v2" --task translate "C:\Users\User1\AppData\Local\Temp\1708af13-1edd-4ff4-8d8e-b003f2a69c6a.wav"
Standalone Faster-Whisper r149.1 running on: CUDA
RuntimeError: CUDA failed with error out of memory

File "faster_whisper\transcribe.py", line 129, in __init__
File "D:\whisper-fast\__main__.py", line 537, in cli
File "D:\whisper-fast\__main__.py", line 655, in <module>
Traceback (most recent call last):
[9844] Failed to execute script '__main__' due to unhandled exception!
Calling whisper Purfview's Faster-Whisper done in 00:00:21.8710278
Loading result from STDOUT

--------------------------------------------------------------------------------------------------------------------

Job Failure 2:

SE: 4.0.0.0 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Message: Calling whisper (Purfview's Faster-Whisper) with : C:\Portable SW\Subtitle Edit 4\Whisper\Purfview-Whisper-Faster\whisper-faster.exe --language de --model "large-v2" --task translate "C:\Users\User1\AppData\Local\Temp\85f936ba-c6b6-4964-b054-7165bb3b7841.wav"
Standalone Faster-Whisper r149.1 running on: CUDA
RuntimeError: CUDA failed with error out of memory

File "faster_whisper\transcribe.py", line 129, in __init__
File "D:\whisper-fast\__main__.py", line 537, in cli
File "D:\whisper-fast\__main__.py", line 655, in <module>
Traceback (most recent call last):
[4812] Failed to execute script '__main__' due to unhandled exception!
Calling whisper Purfview's Faster-Whisper done in 00:00:21.3626655
Loading result from STDOUT

---------------------------------------------------------------------------------------------------------------------

WHAT does this mean ?? I thought that the CUDA must be a native feature of most video cards like this. (?)

How the hell could I be running out of memory, when I never did on the original (lesser) rig -- that was minus these supposed advantages ?

[The above jobs were attempted using my existing "template" of Faster Whisper + Large Model.]

Last edited by Seeker47; 1st Oct 2023 at 00:25.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
1st Oct 2023 00:23 #42
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Another situation, and a different question. I've got at least a couple videos where there seem to be hard (burned in) EN subs, but some "genius" had the bright idea of placing them at the upper left of screen, and in about a 3 point font size, rendering them pretty much unreadable. What would be the best approach to Extract / Transform them to a readable size font / Replace them in the video ? (Preferably at the bottom, where they belong.) Can SE even extract and save hard subs . . . or perhaps using some other tool ? Failing that, they are easy enough to ignore as is, so generating an external .SRT file of them might be just as good. But I'm wondering if such a solution would just circle back to Whisper ?

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
1st Oct 2023 02:40 #43
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by Seeker47

Another situation, and a different question. I've got at least a couple videos where there seem to be hard (burned in) EN subs, but some "genius" had the bright idea of placing them at the upper left of screen, and in about a 3 point font size, rendering them pretty much unreadable. What would be the best approach to Extract / Transform them to a readable size font / Replace them in the video ? (Preferably at the bottom, where they belong.) Can SE even extract and save hard subs . . . or perhaps using some other tool ? Failing that, they are easy enough to ignore as is, so generating an external .SRT file of them might be just as good. But I'm wondering if such a solution would just circle back to Whisper ?

What is the video time duration?
What is the video quality resolution?
Hard subs are not easy to extract but there is a way if you are willing to spend few hours.
SE can't extract hard subs
Your best solution is to use Whisper to generate the subtitles.

Quote
1st Oct 2023 02:44 #44
Subtitles

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2021

Location
Israel
Originally Posted by Seeker47

Well, I tried again, this time on an otherwise identical computer, but which has double the system memory (32 GB. -- and that maxes it out -- vs. 16), plus an MSI Video card that provides the GPU, along with 4 GB. of VRAM. And I'm baffled at how I'm now hitting a brick wall of greatly inferior results. The error messages make no sense to me at all. So I hope they might be revealing to someone else here.

-------------------------------------------------------------------------------------------------------------------

Job Failure 1:

SE: 4.0.0.0 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Message: Calling whisper (Purfview's Faster-Whisper) with : C:\Portable SW\Subtitle Edit 4\Whisper\Purfview-Whisper-Faster\whisper-faster.exe --language fr --model "large-v2" --task translate "C:\Users\User1\AppData\Local\Temp\1708af13-1edd-4ff4-8d8e-b003f2a69c6a.wav"
Standalone Faster-Whisper r149.1 running on: CUDA
RuntimeError: CUDA failed with error out of memory

File "faster_whisper\transcribe.py", line 129, in __init__
File "D:\whisper-fast\__main__.py", line 537, in cli
File "D:\whisper-fast\__main__.py", line 655, in <module>
Traceback (most recent call last):
[9844] Failed to execute script '__main__' due to unhandled exception!
Calling whisper Purfview's Faster-Whisper done in 00:00:21.8710278
Loading result from STDOUT

--------------------------------------------------------------------------------------------------------------------

Job Failure 2:

SE: 4.0.0.0 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Message: Calling whisper (Purfview's Faster-Whisper) with : C:\Portable SW\Subtitle Edit 4\Whisper\Purfview-Whisper-Faster\whisper-faster.exe --language de --model "large-v2" --task translate "C:\Users\User1\AppData\Local\Temp\85f936ba-c6b6-4964-b054-7165bb3b7841.wav"
Standalone Faster-Whisper r149.1 running on: CUDA
RuntimeError: CUDA failed with error out of memory

File "faster_whisper\transcribe.py", line 129, in __init__
File "D:\whisper-fast\__main__.py", line 537, in cli
File "D:\whisper-fast\__main__.py", line 655, in <module>
Traceback (most recent call last):
[4812] Failed to execute script '__main__' due to unhandled exception!
Calling whisper Purfview's Faster-Whisper done in 00:00:21.3626655
Loading result from STDOUT

---------------------------------------------------------------------------------------------------------------------

WHAT does this mean ?? I thought that the CUDA must be a native feature of most video cards like this. (?)

How the hell could I be running out of memory, when I never did on the original (lesser) rig -- that was minus these supposed advantages ?

[The above jobs were attempted using my existing "template" of Faster Whisper + Large Model.]

Try model medium. model large is probably asking too much of the GPU

Quote
1st Oct 2023 12:40 #45
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Originally Posted by Subtitles

Originally Posted by Seeker47

Another situation, and a different question. I've got at least a couple videos where there seem to be hard (burned in) EN subs, but some "genius" had the bright idea of placing them at the upper left of screen, and in about a 3 point font size, rendering them pretty much unreadable. What would be the best approach to Extract / Transform them to a readable size font / Replace them in the video ? (Preferably at the bottom, where they belong.) Can SE even extract and save hard subs . . . or perhaps using some other tool ? Failing that, they are easy enough to ignore as is, so generating an external .SRT file of them might be just as good. But I'm wondering if such a solution would just circle back to Whisper ?

What is the video time duration?
What is the video quality resolution?
Hard subs are not easy to extract but there is a way if you are willing to spend few hours.
SE can't extract hard subs
Your best solution is to use Whisper to generate the subtitles.

Now this is really strange, but interesting: I repeated Job #1 on the original, much less well-equipped computer, that has no GPU. There was no visible result for a long time, so I almost aborted this, thinking that it was never going to complete --- which has happened on a couple other jobs before. (Evidently the Whisper log does not get written until the very end, which would clarify one thing I was wondering about ? OTOH, on a couple of past jobs I could have sworn that subs text was gradually appearing in the SE box at upper left, so I can't be sure about that. Anyway, I decided to let this job run overnight. The next morning, lo and behold there was what looks to be a full .SRT file ! No idea how this managed to work on the "lesser" rig, but not on the maxxed out one that has an i7 3770 CPU and the MSI video card with 4 GB. of dedicated VRAM, vs. the "original" system that has an i5 3550 and no GPU ! The former system with the lesser CPU did not get overwhelmed !

For the last two jobs I had described above, having the same failures, those efforts bombed out within the first 30 seconds. My SE / Whisper settings have thus far remained constant.

Video Job #1 Duration: 57 minutes.
Video Resolution, per MediaInfo: 516 x 478, Matroska, AVC (High @ L4.1, CABAC)

I am certainly willing to experiment with a Medium Model instead of the Large . . . if that could actually be the variable that matters here.

Are there any specific CUDA settings ones needs to have Subtitle Edit set for, in this Whisper scenario ?

On another of those cases that has the miniscule EN hardsubs (and in the wrong screen location), I think I will be wanting to extract them.

Last edited by Seeker47; 1st Oct 2023 at 12:52.

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote
18th May 2024 00:32 #46
Seeker47

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2005

Location
drifting, somewhere on the Sea of Cynicism
Interested in any comments about or user experiences with the new options under Subtitle Edit, specifically the Large V3 language model and the added alternatives to Purfview's Faster Whisper. (Which has itself been updated a couple times fairly recently.)

When in Las Vegas, don't miss the Pinball Hall of Fame Museum http://www.pinballmuseum.org/ -- with over 150 tables from 6+ decades of this quintessentially American art form.

Quote

Steps & Settings for use of Whisper in Subtitle Edit ?

Thread Tools

Similar Threads

New Subtitle Edit with Faster Whisper.

Subtitle Edit using whisper no English

Unusual behavior in Subtitle Edit Whisper voice to music transitions

Whisper engines in Subtitle Edit

Subtitle Edit 3.6.10 new version with Whisper option