I like to translate lot of subtitles. Generally I have to say, I am satisfied with the translations of google or even better deepl. I also found out, there are several tools which use automatic translation services (for example https://translatesubtitles.com or Subtitleedit). Both using google translate and the results are horrible. The problem is, that these tools translating directly the chunks of the subtitles and not the full sentences. This has the advantage, that the timecode alignment of the translated chunks are correct all the time, but the translation result is horrible, because the automatic translation services prefer full sentences.
In 2017 there was a google code challenge exactly addresing this problem: ( https://codein.withgoogle.com/archive/2017/organization/5637337312657408/task/6104526406811648/ ):
Additionally I found out that there is going on a lot of scientific work for professional subtitle translation...I found a lot of academic papers which describe sophisticated approaches, but I can't find any suitable softwares doing so.Things to watch for: You will need to pass complete sentences to deepl, and of course sentences can take more than one subtitle frame, so it's not as easy as read line by line, or frame by frame. You will need to buffer a bit, and well, be a bit clever.
So here is the question:
Do you know any software (free or paid) which can really handle translations in a proper automatic way (and passing/splitting full sentences) ?
+ Reply to Thread
Results 1 to 4 of 4
SubtitleEdit does stuff like merging lines and handle dialogs, tags, and other stuff: https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Translate/Formatting.cs
It has worked like that for the last couple of versions
No AI can accurately automatically translate text because of the different sentence structures and especially with subs, what's written is highly contextual based on the context of the scene. This is especially difficult when a joke or a pun is used. What's funny in one language may be completely incomprehensible in another.
I'm not familiar with other languages, but I know that auto translation for character based Asian languages (Chinese, Korean, Japanese) is often poor because a single character usually translates to a word, but an additional character before or after can change the meaning completely.
1.) It is not merging/splitting sentences, you can see in the code that it searching for other marker (like ',' or '-')
2.) This SetTagsAndReturnTrimmedmethod accepts input and inputNext. When you look on calling-context (https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Translate/GoogleTranslator1.cs) you can see that this input -variable does not merging multiple paragraphs. On each iteration it gets again a new paragraph (line47: var p = paragraphs[index] ) and a nen next (line 58: nextText = paragraphs[index + 1].Text ). this means the merging affects maximum 2 neighboring paragraphs.
3.) I tested it with several chunked sentences. SubtitleEdit does not merging sentences. I checked it manually by merging the chunks manually to sentences and translate them manually by google translate.
Anyway: THANK you very much for the hint that SubtitleEdit is open source! I will implement my requested feature myself
Last edited by wandmaler; 30th Oct 2020 at 06:26.