VideoHelp Forum
+ Reply to Thread
Results 1 to 8 of 8
Thread
  1. Member
    Join Date
    Feb 2004
    Location
    Australia
    Search Comp PM
    Is it possible to create Subtitles for a movie by extracting the audio to a wav file, then use a Speech to Text program to convert and create a Subtitle file, same length as the movie. If so, where do I start, and what software do I use ??
    (I know it is possible to search on the Internet for Subtitles for movies, but if none exists, then I want to create my own).
    Thanks in advance.
    Baron
    Quote Quote  
  2. Always Watching guns1inger's Avatar
    Join Date
    Apr 2004
    Location
    Miskatonic U
    Search Comp PM
    The biggest issues would be

    1. Most text to speech software requires training for the voice it is listening for. This is not possible for a movie cast. Those that work with minimal or not training have very poor accuracy beyond a few key words.

    2. Most text to speech software requires very clear dialogue with little to no ambient noise. Some scenes may be like this, most are not.

    3. Subtitles have two key parts to them - the dialogue, and the time code for when that dialogue should appear on screen, and then disappear. Even the simplest subtitle format - SRT - requires this as a minimum. Open an SRT file in a text editor (notepad) and have a look. And this is without going into handling different characters speaking in the same scene, characters speaking on screen or facing the audience versus characters speaking off screen or facing away from the audience etc.

    It is not impossible, however at this time it is highly unlikely that you can get a clean speech to text translation from a movie, and then having it create workable subtitles with specific formatting and timecodes - it would require very specific software that is not publicly available (if it even exists)
    Read my blog here.
    Quote Quote  
  3. Far too goddamn old now EddyH's Avatar
    Join Date
    Jan 2003
    Location
    Soul sucking suburbia! But a different part since I last logged on.
    Search Comp PM
    ^ This.

    Better STT is something that many people, e.g. Google and Apple, are busily working on for their application and device interfaces, but it'll still be a while before anyone applies it to subtitling. When that happens it'll probably go into automated close-captioning for the TV, babelfish-esque speech-speech translators, accessibility accessories for professional presentations, etc.

    Most things where accurate transcription is required either rely on a speech-text user doing some quick on-the-fly or post-editing, or just bite the bullet and use what's gamely referred to as "human transcription" - it's far cheaper and easier to pay some cubicle monkey in an emerging economy to do it for you, and spend a couple of minutes when they send it back checking for any cross-cultural confusion.

    You'd probably be much better off spending the time you'd take trying to get something like this to work on simply sitting down with one hand on the keyboard and the other on the play/pause button on a remote (put it in a proper DVD player with OSD turned on) and doing it yourself. Had to do that for my own subs, which are of a distinctly non-commercial-material nature
    -= She sez there's ants in the carpet, dirty little monsters! =-
    Back after a long time away, mainly because I now need to start making up vidcapped DVDRs for work and I haven't a clue where to start any more!
    Quote Quote  
  4. Member
    Join Date
    Feb 2004
    Location
    Australia
    Search Comp PM
    Thanks Guns1inger and EddyH for your replies. Yes EddyH re your last paragraph, this is exactly what I am doing using Subtitle Workshop, where I can see the movie and edit the subs where needed.
    Reason I thought there may have been some kind of application to do the job is because I downloaded the srt file for The Pacific Ep01, and there are quite a few words which have been either misspelled or are garbled, and need correcting. The sort of errors you would see in text to speech applications that don't work perfectly (or the reverse....Speech to text).
    Quote Quote  
  5. Always Watching guns1inger's Avatar
    Join Date
    Apr 2004
    Location
    Miskatonic U
    Search Comp PM
    More likely that it was typed by someone who either could not type, or could not spell.
    Read my blog here.
    Quote Quote  
  6. Or they used a translator website to convert it.

    Did you know places like subscene have free subs?
    Quote Quote  
  7. Far too goddamn old now EddyH's Avatar
    Join Date
    Jan 2003
    Location
    Soul sucking suburbia! But a different part since I last logged on.
    Search Comp PM
    Yeah, I'd go with just a cack-handed typist
    Speech-to-Text often works off a dictionary file, so if it misidentifies something, it misidentifies it HARD... more than just a couple character typos, you'll have an entirely different word (or one split into two / two merged into one), etc.

    When you're having to transcribe something of that length - you are after all re-writing the entire damn script - it's easy for errors to creep in. I didn't catch all my own mistakes from when I typed up a presentation I'd been physically present at until about the third or fourth go around. Typing slips, missed capitals or punctuation, and the occasional wrong word order or completely wrong word, STT style. (My favourite was mistaking someone mumbling "teaching rooms" as "kitchen rooms" - must have been coming up to lunchtime!)

    But I can see why you'd think that all the same!
    I'd love to have a reliable robot dictation machine / translator...

    EDIT: 2017, and it's STILL trash. Alexa and Google Home make a slightly better bash at it, and if you have something like Dragon NaturallySpeaking that's properly trained to your voice and you speak through a headset mic in a quiet environment, it can work reasonably well (with just a few adjustments here and there), but the general grade of peach wreck ignition grow hams remains bobbins. I've played with Cortana and as far as she's concerned I may as well be talking Sumerian. Both "Google Now" and "S-Voice" on my Samsung phone, as well as the speech recognition that comes along with the default keyboard and with Swiftkey are good for entertainment purposes but little use for serious text input or hands-free/eyes-off control of the device.

    We've been working on this stuff for probably 50 years now, and 7 since this conversation, and the state of the art seems barely any further advanced (I expect the better examples I just gave actually only pick from a fairly limited range of actions they can perform and guess at which is closest to what you said - similar to how the phone can actually send a text message or post a facebook status with fair reliability, but fails at rendering anything but the shortest simplest free-text sentences, and has no concept of un-pairing a bluetooth device... merely turning the bluetooth radio on and off. repeatedly...). Maybe the human auditory and linguistic cortices that come with a couple million years of evolution backing up their instincts and core structure, and spend six or seven years of continual intensive development and training getting properly up to speed (between the formation of the ear in the womb, through to early-middle childhood), with an entire brain's worth of context memory, intuition, and ability to reason out what the most likely meaning is for what was just heard are just too complicated a thing for a typical small accessory computer program to emulate, and full two-way vocal communication between human and machine won't be realised until we have Yer Actual A.I.s at a sufficient level of development that they can "think" similarly to a somewhat autistic 5-year-old?

    Oh, and what I said above about relying on someone in an outsourced typing pool? Well... besides the existence of Amazon Mechanical Turk services (which are *literally* that, except the worker/s might be anywhere in the world, and there could be between one and thousands of them depending on the job and how much you pay), the Panopto lecture capture system we now use in my workplace offers an add-on (as in, pay-to-upgrade) "automatic" captioning service that, as far as we've been semi-officially told, is actually a manual service ... just using someone else's hands other than the paying customer's...

    Crowdsourcing, it's the future, so it seems. We are all to just be ants inside the Heath Robinson ant-puter.
    Last edited by EddyH; 4th May 2017 at 06:26.
    -= She sez there's ants in the carpet, dirty little monsters! =-
    Back after a long time away, mainly because I now need to start making up vidcapped DVDRs for work and I haven't a clue where to start any more!
    Quote Quote  
  8. Originally Posted by zenzen1 View Post
    Is it possible to create Subtitles for a movie by extracting the audio to a wav file, then use a Speech to Text program to convert and create a Subtitle file, same length as the movie. If so, where do I start, and what software do I use ??
    (I know it is possible to search on the Internet for Subtitles for movies, but if none exists, then I want to create my own).
    Thanks in advance.
    Baron
    Dear Zenzen ,
    Have you find any solution ?
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!