VideoHelp Forum

+ Reply to Thread
Results 1 to 3 of 3
Thread
  1. Member
    Join Date
    Aug 2022
    Location
    Singapore
    Search PM
    I just got a subscription to NHK on demand which is a treasure trove of content. I'm also a fansubber, so the subtitles are very important to help with translations.

    I recently managed to extract the subtitles from the video stream, which was renamed as a html but is actually TTML. It is obvious the caption text is in the file but all the subtitle editors and converters have problems reading the XML.

    Below is an extract of the XML, which shows timecodes, positioning, coloring, etc... If anyone can help to figure out how to get this converted to conventional formats, that would be greatly appreciated!

    HTML Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <cuepoints x="170" y="450" font="MS ゴシック,Osaka−等幅,ヒラギノ角ゴ ProW3,Osaka" color="0xffffff" size="36" underline="false" bold="false" italic="false" ruby="false">
    <cuepoint name="1" time="0.334"/>
    <cuepoint name="2" time="0.367">
    </cuepoint>
    <cuepoint name="3" time="0.634"/>
    <cuepoint name="4" time="0.667">
    </cuepoint>
    <cuepoint name="5" time="34.934"/>
    <cuepoint name="6" time="34.967">
    <subtitle id="201" x="198" y="449" xx="198" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="234" height="40">
    <![CDATA[(トランシーバー・]]>
    </subtitle>
    <subtitle id="202" x="432" y="449" xx="432" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="84" height="40">
    <![CDATA[増渕]]>
    </subtitle>
    <subtitle id="203" x="437" xx="437" y="429" yy="540" size="18" background="0x000000" opacity="0.5" lang="jpn" letterspacing="0" width="74" height="20" ruby="true">
    <![CDATA[ますぶち]]>
    </subtitle>
    <subtitle id="204" x="516" y="449" xx="516" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="193" height="40">
    <![CDATA[)「どうだ ]]>
    </subtitle>
    <subtitle id="205" x="709" y="449" xx="709" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="84" height="40">
    <![CDATA[水上]]>
    </subtitle>
    <subtitle id="206" x="714" xx="714" y="429" yy="540" size="18" background="0x000000" opacity="0.5" lang="jpn" letterspacing="0" width="74" height="20" ruby="true">
    <![CDATA[みずかみ]]>
    </subtitle>
    <subtitle id="207" x="793" y="449" xx="793" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="50" height="40">
    <![CDATA[」。]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="7" time="37.967"/>
    <cuepoint name="8" time="41.634"/>
    <cuepoint name="9" time="41.667">
    <subtitle id="201" x="138" y="449" xx="138" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="433" height="40">
    <![CDATA[(増渕)水上 応答しろ!]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="10" time="44.367"/>
    <cuepoint name="11" time="46.400"/>
    <cuepoint name="12" time="46.433">
    <subtitle id="201" x="318" y="449" xx="318" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="324" height="40">
    <![CDATA[何かあったのか!]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="13" time="49.067"/>
    <cuepoint name="14" time="49.100">
    <subtitle id="201" x="198" y="449" xx="198" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="606" height="40">
    <![CDATA[(トランシーバー・水上)「白い鳥が」。]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="15" time="52.400"/>
    <cuepoint name="16" time="56.734"/>
    <cuepoint name="17" time="56.767">
    <subtitle id="201" x="398" y="449" xx="398" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="170" height="40">
    <![CDATA[(爆発音)]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="18" time="59.767"/>
    <cuepoint name="19" time="62.067"/>
    <cuepoint name="20" time="62.100">
    <subtitle id="201" x="418" y="449" xx="418" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="124" height="40">
    <![CDATA[水上!]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="21" time="64.100"/>
    <cuepoint name="22" time="68.567"/>
    <cuepoint name="23" time="68.600">
    <subtitle id="201" x="398" y="389" xx="398" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="170" height="40">
    <![CDATA[(爆発音)]]>
    </subtitle>
    <subtitle id="202" x="498" y="449" xx="498" yy="620" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="290" height="40">
    <![CDATA[(増渕)うわっ!]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="24" time="70.500"/>
    <cuepoint name="25" time="70.533">
    <subtitle id="201" x="398" y="449" xx="398" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="170" height="40">
    <![CDATA[(爆発音)]]>
    </subtitle>
    </cuepoint>
    <cuepoint name="26" time="77.333"/>
    <cuepoint name="27" time="82.400"/>
    <cuepoint name="28" time="82.433">
    <subtitle id="201" x="138" y="359" xx="138" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="27" height="40">
    <![CDATA[(]]>
    </subtitle>
    <subtitle id="202" x="165" y="359" xx="165" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="84" height="40">
    <![CDATA[小磯]]>
    </subtitle>
    <subtitle id="203" x="179" xx="179" y="339" yy="540" size="18" background="0x000000" opacity="0.5" lang="jpn" letterspacing="0" width="56" height="20" ruby="true">
    <![CDATA[こいそ]]>
    </subtitle>
    <subtitle id="204" x="249" y="359" xx="249" yy="560" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="187" height="40">
    <![CDATA[)これより]]>
    </subtitle>
    <subtitle id="205" x="158" y="449" xx="158" yy="620" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="107" height="40">
    <![CDATA[弊社 ]]>
    </subtitle>
    <subtitle id="206" x="265" y="449" xx="265" yy="620" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="124" height="40">
    <![CDATA[東國実]]>
    </subtitle>
    <subtitle id="207" x="272" xx="272" y="429" yy="600" size="18" background="0x000000" opacity="0.5" lang="jpn" letterspacing="0" width="110" height="20" ruby="true">
    <![CDATA[ひがしくにみ]]>
    </subtitle>
    <subtitle id="208" x="389" y="449" xx="389" yy="620" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" width="427" height="40">
    <![CDATA[化学 第一工場で起きた]]>
    </subtitle>
    <subtitle id="209" x="816" y="449" xx="816" yy="620" background="0x000000" opacity="0.5" lang="jpn" letterspacing="4" substitution_string="→" gaiji_pattern="000000000000000000000000000000000000000000000000004000000006000000007000000007800000007C00000007E00000007F00000007F807FFFFFFC07FFFFFFE07FFFFFFF07FFFFFFF87FFFFFFFC7FFFFFFFE7FFFFFFFC7FFFFFFF87FFFFFFF07FFFFFFE07FFFFFFC0000007F80000007F00000007E00000007C00000007800000007000000006000000004000000000000000000000000000000000000000" gaiji_width="36" gaiji_height="36" width="44" height="40">
    <![CDATA[ ]]>
    </subtitle>
    Quote Quote  
  2. If the subs are captions you could try to extract them with clever Ffmpeg-GUI.
    Quote Quote  
  3. Member
    Join Date
    Aug 2022
    Location
    Singapore
    Search PM
    With the kind help of a developer, there is now a python script to convert these NHK TTML's into SRT.

    Go to https://github.com/nopol10/ttml for the script and instructions.
    Quote Quote  



Similar Threads