I've had Ok luck with using Whisper in the small model version on some shorter pieces. But a full length play in mkv repeatedly hung after a number of hours. Yes, the job ran for a long time on a PC with no other activity: HD light stays on and everything else froze. Since it was on overnight I'll say it was running for 24 hours. I expected a long run for this but twice I've seen the problem displayed so I'll leave off of trying it again.
Previously I had seen SE freezeups in normal sub editting and could get it going with a reboot by running SFC System File Checker in
Windows 10. I found that solution by accident but perhaps it points to some other problem. Unfortunately, error reporting does not take place that I'm aware that is visible. Is there a log someplace for the Whisper or VOSK options that turn?
+ Reply to Thread
Results 1 to 26 of 26
Yes there is a file called error_log.txt in the SE folder.
You could also check the system event log (eventvwr.msc from the RUN box) Application and System events at the corresponding times
I'll take a look at that. I wasn't aware how the special dialog that opens for VOSK or Whisper reports to SE logs.
You should use GPU instead of CPU.
Depending on your GPU, you can get better and faster transcription using at least medium model for example with 8GB GPU.
You can help it if you isolate the speech by using Soleeter and use that with Whisper AI. This way there are no confusing surrounding noises.
Please check the attached Whisper AI help file. There are many conditions that can help you prevent it from freezing.
I found that this option helps it best.
Hello Subtitle and thanks.
I took a quick look at the attachment. In the past I've encountered errors since I don't understand command line or the errors it generates. That's why I use SE. I still have to look at that log as well.
No one has said why running SFC /scannow would bring back an SE freezeup so far. That is also peculiar.
Whisper AI needs a lot of resources especially RAM, GPU and CPU. Also it needs all other applications closed while transcribing.
I have Intel i5 and 16GB RAM and GPU with 8GB and it can still freeze on me if I as much as breath.
Command line option is very easy to set up. If you intend to give it a try, then you must have Python version 3.10.7 and please note that the latest version will not work well with Whisper AI.
I have tried to transcribe the lyrics of the song Ventura Highway, America with different Whisper AI models.
The highest I could get is medium because my GPU is only 8GB and increasing it would mean replacing the power supply with at least 1000W.
As you can see only the medium transcription has few errors.
Not much showing in it however-- I've copied in the the two most recent starts of the program.
But I think there is something else going on with long load times etc. I've repeatedly turned off Windows updates
in services.msc only to have it return. I turned it off again. Perhaps with no cold restarts it will stay disabled and I will retry getting Whisper to complete my play file. It is long and contains 18th century english. All that might slow things down. I'm just trying to see if I can avoid the freezups. I'll start that today...
The PC running the job has only this task going. Other processes I'd have to check msconfig. Another possibilty is to pull the ethernet plug when I'm doing one of these. Anu suggestions on paring down CPU activity are welcome.
I will pull the ethernet just to see how it goes.
Date: 06/06/2023 23:00:56
SE: 18.104.22.168 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Message: C:\Users\lon\AppData\Roaming\Subtitle Edit\Whisper\main.exe --language en --model "C:\Users\lon\AppData\Roaming\Subtitle Edit\Whisper\Models\small.en.bin" --output-srt --print-progress "C:\Users\lon\AppData\Local\Temp\a3442e20-61f4-4b9a-8a6d-d6a880b76750.wav"
Date: 06/06/2023 23:02:20
SE: 22.214.171.124 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Message: C:\Users\lon\AppData\Roaming\Subtitle Edit\Whisper\main.exe --language en --model "C:\Users\lon\AppData\Roaming\Subtitle Edit\Whisper\Models\small.en.bin" --output-srt --print-progress "C:\Users\lon\AppData\Local\Temp\09895080-c665-4df2-81d0-5a3ab2fb378a.wav"
Try to this job on Google Notebook (Google Colab). I haven't used it myself because I have trouble with Google Drive but few friends used it and were pleased with it.
Your computer freezes because it doesn't have enough resources. The log doesn't tell you that unfortunately.
I will have a look at Google Notebook though never heard of it.
Today I'm making another attempt to run the job in SE Whisper with ethernet unplugged and all services shut down through msconfig.
While running the job I have Task Manager open and the CPU activity is flat at about 50% of usage. With that going I'll be able to
see if data stops and freezes. Keep in mind I saw data flowing the old way for the good part of a day then something froze. I have the idea that
Win10 uses it's services to interfere. Not an ourageous thought when you read all of the snoops and such Windows has routinely running. Open services and start reading a few. There's over 100.
However if I want to connect ethernet again, services will have to resume.
You can try VideoStudio Pro 2023 or VideoStudio Ultimate 2023 they have 30 days fully functional free trial and their software includes Speech to Text converter.
I tried it and it is relatively simple to use. Apparently they are based on VOSK. It should run smoothly without freezing unless your computer has other issues.
Since yesterday the job is still going so we'll see what the result is.
On the VOSK, that always worked well in the SE application.
Using Task Master to monitor provides what the SE routine does not : run time,
any spikes in activity (none) and so on.
Thanks for the help on this.
Apparently SE can't process the small model because it needs 2GB of VRAM (Video RAM).
Try the tiny or base models and see if you get better response.
As I mentioned before, Whisper AI needs a lot of resources.
As a general rule, if the computer hangs or freezes, I just switch it off there is no point in letting it work all night because it will not do anything.
Is your video accessable easily? I can make you srt files with different models using my system with Command Line. It has never failed me unless I ask it to do the impossible which is the large model.
I have had success.
Don't know where my original message on went but here's the result:
I looked at Task Manager this, the following day, and it had some blips then no activity in the display. The job was done
after 20 hours which I don't see at unusual using the small model.
So I replugged the ethernet and reset msconfig to its normal mode with the services back on.
There may be some programming trick available to SE to run Win10 normally.
thanks to all who answered.
One further observation.
I looked at a very early Whisper job I did but had not scanned the whole length of the piece.
I have no explanation for a page or so long series of "hiccups" where the same line is repeated
over and over-- not an error message but a short piece of text such as "continues reading."
There is then nothing after it but that same two word phrase for a page or so and time stamped as I recall.
I can only assume or conjecture that before the Ethernet was unpluggesd, those were
attempts to access the system.
The full description is in the Whisper help file I attached previously
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop (default: True)
What this means is that it prevents the repetition of the same text over and over again.
Yes, I realized it was a different problem after I recoded the same piece last night. Thanks for the tip but I am not using command line. Is there some sort of trigger from special characters like " " , [ ] * in the program that does this ? Is this a bug that SE can fix in the load of the program ?
But since you don't use command line, search for alternatives on how to isolate speech or vocals in GUI.
A good idea.
Apologies for confusing editing with translating above.
I was informed earlier for best result in Whisper to let the program load the audio from the MKV.
If loading the (I guess) WAV directly I could use what you suggest or the multiple voice tools available from such as Goldwave or
Audacity or an old favorite called The Levelator which was developed for pod casts and brings voice forward. That produces a WAV directly.
I looked at spleeter remarks. Some conflicts there with usage. But I think that Levelator which I know works and doesn't produce any weird sound from trying to split tracks is a good place to start. It's just a simple drag and drop. I've used it for years.
here's a bit about it:
This does sound like a Whisper problem though.
Thanks for all the referals on this.
I'd prefer fixes to the tools themselves rather than try to do patches. I'm not an expert user. Filtering for voice with something like Levelator is all I can manage. The source is old-- 1978 and from a public source on Youtube. It may be just the audio degradation. However VOSK is a good fall-back for me. My original for this piece was completed in VOSK and then hand-editted. It's nearly done. I'm just proofing it by ear and looking up some phrases.
I didn't realize that this was just a one time task.
Anyway I am glad you got your transcription finally.
When you feel brave enough to consider command line Whisper AI setup, I will be happy to help.
It usually takes less than 20-30 minutes to setup everything from scratch for experienced user.
Thanks for the considerate offer. A while back there was a member who showed how to set up the Python etc and I tried that.
Errors ensued so I'd rather leave it to SE and Whisper. After all, SE has a whole team of contributers. If the problem is a known error, they should get it resolved. I have not contacted SE directly on it. Perhaps you or others could explain the problem better than I.
As to tasks, I've only seen this peculiar error from one source. But like I said, the source may be degraded in some way I don't understand.
I have now seen the repeated line error... in Whisper on a new project using SE.
But the text eventually comes back on. I did not pull the power plug or touch the rig until the job showed
To compensate, and maybe complete this task, I'm doing the same job with VOSK, then see if I can fill in the lines just listed for a time as [ music] [ music ] [ music ] .... after a music sequence had concluded in the Whisper transfer. The behaviour is very odd since the spoken language is very clear.
To Whisper's credit, the lines that do transfer to text are largely very good using the small model in Subtitle Edit.
I wondered if the medium model would now work as I continue to unplug ethernet to avoid any Microsoft jamming ? (see previous posts on pulling ethernet plug to complete a job above)