VideoHelp Forum
+ Reply to Thread
Results 1 to 4 of 4
Thread
  1. I like to translate lot of subtitles. Generally I have to say, I am satisfied with the translations of google or even better deepl. I also found out, there are several tools which use automatic translation services (for example https://translatesubtitles.com or Subtitleedit). Both using google translate and the results are horrible. The problem is, that these tools translating directly the chunks of the subtitles and not the full sentences. This has the advantage, that the timecode alignment of the translated chunks are correct all the time, but the translation result is horrible, because the automatic translation services prefer full sentences.

    In 2017 there was a google code challenge exactly addresing this problem: ( https://codein.withgoogle.com/archive/2017/organization/5637337312657408/task/6104526406811648/ ):
    Things to watch for: You will need to pass complete sentences to deepl, and of course sentences can take more than one subtitle frame, so it's not as easy as read line by line, or frame by frame. You will need to buffer a bit, and well, be a bit clever.
    Additionally I found out that there is going on a lot of scientific work for professional subtitle translation...I found a lot of academic papers which describe sophisticated approaches, but I can't find any suitable softwares doing so.

    So here is the question:
    Do you know any software (free or paid) which can really handle translations in a proper automatic way (and passing/splitting full sentences) ?
    Quote Quote  
  2. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    SubtitleEdit does stuff like merging lines and handle dialogs, tags, and other stuff: https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Translate/Formatting.cs
    It has worked like that for the last couple of versions
    Quote Quote  
  3. Member
    Join Date
    Jul 2007
    Location
    United States
    Search Comp PM
    No AI can accurately automatically translate text because of the different sentence structures and especially with subs, what's written is highly contextual based on the context of the scene. This is especially difficult when a joke or a pun is used. What's funny in one language may be completely incomprehensible in another.

    I'm not familiar with other languages, but I know that auto translation for character based Asian languages (Chinese, Korean, Japanese) is often poor because a single character usually translates to a word, but an additional character before or after can change the meaning completely.
    Quote Quote  
  4. Originally Posted by Nikse View Post
    SubtitleEdit does stuff like merging lines and handle dialogs, tags, and other stuff: https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Translate/Formatting.cs
    It has worked like that for the last couple of versions
    Hi, thanks for the answer. Unfortunatly this code does not handle the sentence-merging-feature I am asking for.
    1.) It is not merging/splitting sentences, you can see in the code that it searching for other marker (like ',' or '-')
    2.) This SetTagsAndReturnTrimmedmethod accepts input and inputNext. When you look on calling-context (https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Translate/GoogleTranslator1.cs) you can see that this input -variable does not merging multiple paragraphs. On each iteration it gets again a new paragraph (line47: var p = paragraphs[index] ) and a nen next (line 58: nextText = paragraphs[index + 1].Text ). this means the merging affects maximum 2 neighboring paragraphs.
    3.) I tested it with several chunked sentences. SubtitleEdit does not merging sentences. I checked it manually by merging the chunks manually to sentences and translate them manually by google translate.

    Anyway: THANK you very much for the hint that SubtitleEdit is open source! I will implement my requested feature myself



    Originally Posted by lingyi View Post
    No AI can accurately automatically translate text because of the different sentence structures and especially with subs, what's written is highly contextual based on the context of the scene. This is especially difficult when a joke or a pun is used. What's funny in one language may be completely incomprehensible in another.

    I'm not familiar with other languages, but I know that auto translation for character based Asian languages (Chinese, Korean, Japanese) is often poor because a single character usually translates to a word, but an additional character before or after can change the meaning completely.
    I completly agree with that. But this doesn't contradict with the fact, that current AI translation works better with full sentences and not separate chunks. I am not looking for the perfect AI translation, but I try to improve the current AI translation approach to get better results.
    Last edited by wandmaler; 30th Oct 2020 at 06:26.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!