VideoHelp Forum




+ Reply to Thread
Results 1 to 10 of 10
  1. Banned
    Join Date
    Jun 2007
    Location
    UNREACHABLE
    Search Comp PM
    After having been forced to study this subject a little harder
    (please see thread https://forum.videohelp.com/topic333501.html?sid=3219e3ecdb111fb5a6c3895b6cb95a20),
    I have concluded that the alleged "Unicode-support" in TextSub
    and in the SubStation Alpha file formats is lame, to say the
    least. As an older member of the VideoHelp community has already
    stated for several times, many people do not (want to) understand
    very well what the heck Unicode is. I'm afraid many people still think
    that being capable to deal with different charsets/encodings simultaneously
    is the same as being "Unicode-ready". Not the case. Yes, .SSA files plus
    DirectVobSub filter really manage to display different writing systems
    at the same time, but actually they do not seem to use Unicode at all for
    achieving that goal. In other words, they simply manage to handle several
    codepages at the same time --- which is a completely different story,
    period. Putting things in a simple way: non-American Western codepages =
    one character in one-byte between 80 and FF plus a language-flag, and
    CJK codepages = one character in two-bytes between 80 and FF plus a language-flag,
    whereas Unicode/UTF-8 = one character in two-bytes from 00 thru FF plus
    no language-flag. Web browsers usually have no problem in reading real UTF-texts or
    even binary Unicode files, however I still
    haven't found a way to make Vsfilter.dll(v. 2.33) show such capability.

    The subtitle picture that I posted to the thread mentioned above
    was generated with a script that called only two codepages, namely
    ANSI(code 0) and Japanese/Shift_JIS(code 128), notwithstanding,
    it was able and sufficient to display Greek and Cyrillic characters
    as well, simply because basic Greek and Cyrillic characters already
    are a subset of Shift_JIS. On the other hand, the picture shown below
    was gotten from a script that (appropriately???) called five codepages.
    I hope these considerations can be helpful.

    ==============================================

    Quote Quote  
  2. Banned
    Join Date
    Jun 2007
    Location
    UNREACHABLE
    Search Comp PM
    Midzuki wrote:

    I have concluded that the alleged "Unicode-support" in TextSub
    and in the SubStation Alpha file formats is lame, to say the
    least.
    Web browsers usually have no problem in reading real UTF-texts or
    even binary Unicode files, however I still
    haven't found a way to make Vsfilter.dll(v. 2.33) show such capability.
    Well, now I have to admit I was wrong (but not entirely).
    Yes, Vsfilter will interpret correctly UTF8/Unicode SSA scripts,
    but only under (one of) the following conditions:

    --- UTF sub scripts MUST begin with µ$oft-inventioned "UTF8 ID"
    {bytes EF BB BF};

    --- Unicoded subtitle files MUST be "little-endian"
    {byte-order mark = FF FE};

    Besides, every formatting-style would better have its "Encoding" field set to "1"
    ( = default). In this way, subtitle pics can display ALL printable characters contained in the font Arial Unicode MS without any need to jump from one codepage to another ^_^

    Again, I hope this info will be usefull.

    =======================================
    Quote Quote  
  3. Banned
    Join Date
    Jun 2007
    Location
    UNREACHABLE
    Search Comp PM
    Time for another overdue correction:

    --- Unicoded subtitle files MUST be "little-endian"
    {byte-order mark = FF FE};
    Not at all. Vsfilter does not care whether the Unicode file is
    little-endianed or big-endianed, as long as the first two bytes
    of the file are a Byte-Order Mark. And now, at last,
    appears the actual source of my current problems with subtiles: TextPad!
    The damn text editor by default does *not* write the UTF-8/Unicode BOMs!
    The solution I have found: I'm moving to Microsoft's Wordpad!
    And here goes the question to the makers of TextPad (Helios Software):

    What is the point of opening and saving in UTF-8 or Unicode
    if the damn application cannot get rid of the short-sighted limitations of the
    narrow-minded and language-segregating codepage scheme?

    =================================

    Quote Quote  
  4. Member mats.hogberg's Avatar
    Join Date
    Jul 2002
    Location
    Sweden (PAL)
    Search Comp PM
    I must ask:
    Why?
    Isn't the purpose of subtitles to translate one or many spoken languages into text in the language understood by the viewer?
    So: One subtitle file - one language?

    /Mats
    Quote Quote  
  5. Banned
    Join Date
    Jun 2007
    Location
    UNREACHABLE
    Search Comp PM
    It seems there is some misunderstanding, I'm afraid :-(

    Isn't the purpose of subtitles to translate one or many spoken languages into text in the language understood by the viewer?
    Of course. Notwithstanding, if when you say that, you mean "subtitle tools that
    fully-support UTF-8 and Unicode are necessarily useless", then...

    I have no idea of what kind of software one'd better use when the goal is
    produce subtitles in languages written in Devanagari or similar "exotic" charsets
    for example, but I serious doubt you would be able to do it inside the preset
    extended-ASCII codepages handled by TextSub and .SSA.

    ===

    Quote Quote  
  6. Member mats.hogberg's Avatar
    Join Date
    Jul 2002
    Location
    Sweden (PAL)
    Search Comp PM
    Ah. Ok. I was under the impression your problem was not to create subtitles with different character sets, but to have multiple character sets in one subtitle file.

    /Mats
    Quote Quote  
  7. Banned
    Join Date
    Jun 2007
    Location
    UNREACHABLE
    Search Comp PM
    I was under the impression your problem was... to have multiple character sets in one subtitle file.
    That was the problem of the OP that started the thread

    and Unicode was created exactly to allow anyone to represent correctly several languages within a same document (OR in a subtitle file, as he wanted).

    Regards,

    Midzuki.

    ===
    Quote Quote  
  8. Member
    Join Date
    Jul 2014
    Location
    焦作/河南
    Search Comp PM
    I feel interesting that my browser took me here when I was trying to do some studies on the generate code 128 barcode in word. Anyway, this thread do attract my interests, sounds like a good place for me to get some leisure.
    Quote Quote  
  9. I'm a MEGA Super Moderator Baldrick's Avatar
    Join Date
    Aug 2000
    Location
    Sweden
    Search Comp PM
    Originally Posted by nihenaita View Post
    I feel interesting that my browser took me here when I was trying to do some studies on the generate code 128 barcode in word. Anyway, this thread do attract my interests, sounds like a good place for me to get some leisure.
    What?
    Quote Quote  
  10. Originally Posted by nihenaita View Post
    I feel interesting that my browser took me here when I was trying to do some studies on the generate code 128 barcode in word. Anyway, this thread do attract my interests, sounds like a good place for me to get some leisure.
    Because code 128 encode ASCII Character Set and Midzuki mentioned ASCII. So, google may take you to this site.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!