Voice to text converter

9th Aug 2018 21:17 #1
carlmart

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2004

Location
Brazil
I am not sure where in the forum to put this question, but as it related to video dialogue let's start here.

Does anyone know which program is YouTube (and others) using to convert audio words into text for the subtitles you can open now?

Some time ago, when my father had a stroke and thought continuing to use his computer might be a good thing, a friend of mine told me about a program that allowed you to talk into a microphone and convert your words into text.

You had to "teach" the program until it recognized your voice. But this new program seems to be more sophisticated than that, as it recognizes any person's voice speaking on the videos.

My wife also told me of X-Ray machines, working with computers, also allowed talking the diagnose into the computer's mic and being converted to text.

I haven't yet googled about this, which I will do, but maybe someone here is already familiar or used some program that does this.

Quote
9th Aug 2018 22:25 #2
KarMa

View Profile

View Forum Posts

Private Message
Dinosaur Supervisor

Join Date
Jul 2015

Location
US
Windows itself has voice to text converting builtin. You have to spend 30 minutes training it but it seemed to work for me after I did the training. Not sure how good it is with languages other than English though. It's called Windows Speech Recognition.

Quote
9th Aug 2018 22:36 #3
carlmart

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2004

Location
Brazil
Sorry, but I did not explain what I was going to use it for.

What I want is to feed an audio from a video, not a microphone, and have the program convert it into text, exactly like they do in YouTube apparently.

Of course it would be even better to convert that audio into a timed text, like a subtitle, but that should be too much to ask probably.

But if the subtitle can show on the screen, I might get an OCR program like SubtitleEdit to recognize it.

Quote
9th Aug 2018 23:01 #4
redwudz

View Profile

View Forum Posts

Private Message
Mod Neophyte

Join Date
Sep 2002

Location
USA
Dragon Naturally Speaking is a popular payware speech to text converter that has been around for a long time.

Speech recognition software has improved quite a bit in the last few years.
Even my car has speech recognition and and it works surprising well. Haven't tried the W10 speech recognition program.

Generally, speech recognition has a fair amount of errors for conversion to perfectly correct text. No surprise as we all talk differently.
Training/teaching the interface does help. Some Languages are even more complex for conversion. Good luck with that.
I often do better listening and just typing.But I would try some of the programs available to see what may work for you.

These type of programs may save some time trying to transcribe audio to text.
But be sure to check and correct any text for errors.

Humans are still smarter than machines.

Last edited by redwudz; 9th Aug 2018 at 23:09.

Quote
10th Aug 2018 07:02 #5
carlmart

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2004

Location
Brazil
Correcting the recognized text is certainly a must.

But I am amazed, lately, on how dialogue on YouTube videos is coming accurate, with very few words wrong. That was not the case until not too much ago.

The questions still remains, on the videos I want to recognize the audio in, on the timings for each speak. There probably isn't a way to do that automatically, unless the text shows on the video. I wonder how YouTube does that.

Quote
10th Aug 2018 07:35 #6
Hoser Rob

View Profile

View Forum Posts

Private Message
Member

Join Date
Mar 2011

Location
Nova Scotia, Canada
Originally Posted by carlmart

Correcting the recognized text is certainly a must.

But I am amazed, lately, on how dialogue on YouTube videos is coming accurate, with very few words wrong. That was not the case until not too much ago.

The questions still remains, on the videos I want to recognize the audio in, on the timings for each speak. There probably isn't a way to do that automatically, unless the text shows on the video. I wonder how YouTube does that.

YT is owned by Google and uses their speec recognition tech. i suspect the reason it's become more accurate is the same reason Google Translate works spo much better now ... it's running on newer billion dollar server farms. Whether you can get the same functionality on software running on a personal computer is another question. I dopn't actually know the answer to that but I'm not optimistic.

Quote
9th Oct 2021 14:56 #7
p_l

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2002

Location
Montreal, Canada
YT uses machine learning with the largest data sampling you could wish for: YT videos.

Quote
9th Oct 2021 17:48 #8
Cornucopia

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2001

Location
Deep in the Heart of Texas
IIRC, they do have realtime transcription services (both human and AI), but they cost quite a bit for a subscrption (and they often don't offer a one or two time deal).
Those will also still have errors, regardless.

Scott

Quote

Voice to text converter

Thread Tools

Search Thread

Similar Threads

Best text-to-speech voice?

Text to subtitle format converter

Suggestions for some voice changing software for cartoon voice-overs?

Seeking for converter low voice to high

Basic question about scenario Converter > VirtualDub > Converter