VideoHelp Forum
+ Reply to Thread
Results 1 to 5 of 5
Thread
  1. Member
    Join Date
    Feb 2022
    Location
    Australia
    Search Comp PM
    Hi all - this is the closest section I can find for this question and I have spent considerable time doing various searchs on this forum for anything to do with language detection - but nothing!

    As the title says, all I want to know is if there is any software that is commonly available that can detect what language is being spoken in a video soundtrack /audio file?

    There must be! But according to 90% of answers on QUORA, no-one seems to think there is. But even VidCoder has something basic. It seems to be able to detect latin languages and English, of course. I work with videos filmed in different countries and I often would like to know what language they are speaking in. VidCoder has "Soundtrack" at the top left and on a certain video I loaded in, it picked up that the soundtrack was Portugese. It could do Spanish and Italian as well. But doesn't know Slavic languages (eg Russian, Polish, Ukr, Czech, Serbian etc). Just says "Unknown soundtrack" Even though I know there is clear Russian being spoken.

    However maybe Vidcoder was not picking up the spoken language as such but picked up some actual language code that was perhaps embedded in the video soundtrack?

    What I would like to know is --- Is there any software - video or other, that when you load in a video with a speaking soundtrack in an unknown language, that it can detect and tell me what language is being spoken? That's all. So if I load in a clip spoken in a language I don't recognise Eg Romanian or Polish, or Russian, or any of dozens of languages) will it be able to tell me? Surely there must be that technology available these days?

    Even if it only says "Slavic Detected" Since there are at least twelve common Slavic languages. As well as many Latin American (Spanish-based) dialects in South America.

    Thanks for any ideas
    Last edited by Dart77; 17th Sep 2022 at 10:47. Reason: clarification
    Quote Quote  
  2. vidcoder and similar tools only look at the language tags of the streams.
    Don't know of any tools which uses some (probably ai based method) to guess whether the sound is belongs to language xy.
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  3. Member
    Join Date
    Feb 2022
    Location
    Australia
    Search Comp PM
    Originally Posted by Selur View Post
    vidcoder and similar tools only look at the language tags of the streams.
    Don't know of any tools which uses some (probably ai based method) to guess whether the sound is belongs to language xy.
    thnks for input.
    Quote Quote  
  4. Member Cornucopia's Avatar
    Join Date
    Oct 2001
    Location
    Deep in the Heart of Texas
    Search PM
    Considering there are hundreds, if not thousands, of languages spoken, and probably hundreds of dialects per (major) language, then there is an individual's unique vocal quirks and accent, along with recordings of various quality levels, the ability to parse, process, pattern match, and recognize these into simplified language categories is a monumental task to put it mildly.

    Look at Google Translate...it has trouble with some words/phrases (in auto-detect mode), and that's with something as streamlined and consistent as text. It doesn't even know/recognize one of the languages I know.

    Scott
    Quote Quote  
  5. Member
    Join Date
    Feb 2022
    Location
    Australia
    Search Comp PM
    Thanks for the reply Scott. The way you put it, you are undoubtedly correct. Just too many variables but I still don't fully agree with it being in the "Too Hard Basket". I have heard that AI is moving forward in this area. I just can't see why, if they have the technology to create relatively realistic synthetic voice narration (even if it is still obvious and still rather irritating to listen to), why they can't have voice recognition for at least some main languages like Slavic or Hebrew or Latin-based etc. The speed of the spoken content is not an issue, the software can easily slow it down to whatever speed it needs to detect certain common inflections and groupings of sounds etc which would be common for each language. It could have a data base of common words, phrases and accents/dialects and match up the sound with these. The more input quantity of spoken audio given, the more accurate would be the result.

    I'm sure it's out there, just maybe not commonly available as a tool yet.

    -------------------------------------------

    Well, I've just come back from an hour's break from writing this post. I did some more research and seems like I have intuitively answered my own question. For anyone interested in this thread, I just found this site: https://translatedlabs.com/spoken-language-identifier . Translated Labs have offices in USA and three other major European countries and they have engineered software called "Spoken Language Identifier". I'll cut and paste some of the basic information on it: "Spoken Language Identifier is a service that tries to determine the language spoken in an audio recording. The model currently supports 8 languages: English, Spanish, Italian, French, German, Portuguese, Dutch, and Russian. You can test the spoken language identifier in several ways: recording your audio, uploading an audio file or using one of our examples:"...."Technology: The model uses convolutional and recurrent neural networks trained on tens of hours of speech data. This is an end-to-end model that uses a raw waveform as input and makes no assumptions about the phonetics or the grammars of the languages considered. Rather, it tries to infer all the relevant features of the audio from the data. It produces the probability distribution over the languages recognized by the model as the output....You can use it to classify recordings as short as 1 second and as long as a minute. Note that the longer the recording, the higher the accuracy of the prediction. For 20 second recordings the accuracy is about 95%, while for 5 second samples it is just over 80%...". Other parts of the website direct you on how to purchase it and also express interest in engineers who think they can improve on their model prototype.

    So there you have it! They allow you to test it yourself, so later I will find some audio of a language I do not know and convert it to MP3 and try it out.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!