Multilanguage support in TextSub and .SSA/.ASS

20th Jul 2007 13:21 #1
Midzuki

View Profile

View Forum Posts
Banned

Join Date
Jun 2007

Location
UNREACHABLE
After having been forced to study this subject a little harder
(please see thread https://forum.videohelp.com/topic333501.html?sid=3219e3ecdb111fb5a6c3895b6cb95a20),
I have concluded that the alleged "Unicode-support" in TextSub
and in the SubStation Alpha file formats is lame, to say the
least. As an older member of the VideoHelp community has already
stated for several times, many people do not (want to) understand
very well what the heck Unicode is. I'm afraid many people still think
that being capable to deal with different charsets/encodings simultaneously
is the same as being "Unicode-ready". Not the case. Yes, .SSA files plus
DirectVobSub filter really manage to display different writing systems
at the same time, but actually they do not seem to use Unicode at all for
achieving that goal. In other words, they simply manage to handle several
codepages at the same time --- which is a completely different story,
period. Putting things in a simple way: non-American Western codepages =
one character in one-byte between 80 and FF plus a language-flag, and
CJK codepages = one character in two-bytes between 80 and FF plus a language-flag,
whereas Unicode/UTF-8 = one character in two-bytes from 00 thru FF plus
no language-flag. Web browsers usually have no problem in reading real UTF-texts or
even binary Unicode files, however I still
haven't found a way to make Vsfilter.dll(v. 2.33) show such capability.

The subtitle picture that I posted to the thread mentioned above
was generated with a script that called only two codepages, namely
ANSI(code 0) and Japanese/Shift_JIS(code 128), notwithstanding,
it was able and sufficient to display Greek and Cyrillic characters
as well, simply because basic Greek and Cyrillic characters already
are a subset of Shift_JIS. On the other hand, the picture shown below
was gotten from a script that (appropriately???) called five codepages.
I hope these considerations can be helpful.

==============================================

Quote
25th Jul 2007 13:51 #2
Midzuki

View Profile

View Forum Posts
Banned

Join Date
Jun 2007

Location
UNREACHABLE
Midzuki wrote:

I have concluded that the alleged "Unicode-support" in TextSub
and in the SubStation Alpha file formats is lame, to say the
least.

Web browsers usually have no problem in reading real UTF-texts or
even binary Unicode files, however I still
haven't found a way to make Vsfilter.dll(v. 2.33) show such capability.

Well, now I have to admit I was wrong (but not entirely).
Yes, Vsfilter will interpret correctly UTF8/Unicode SSA scripts,
but only under (one of) the following conditions:

--- UTF sub scripts MUST begin with µ$oft-inventioned "UTF8 ID"
{bytes EF BB BF};

--- Unicoded subtitle files MUST be "little-endian"
{byte-order mark = FF FE};

Besides, every formatting-style would better have its "Encoding" field set to "1"
( = default). In this way, subtitle pics can display ALL printable characters contained in the font Arial Unicode MS without any need to jump from one codepage to another ^_^

Again, I hope this info will be usefull.

=======================================

Quote
26th Jul 2007 12:50 #3
Midzuki

View Profile

View Forum Posts
Banned

Join Date
Jun 2007

Location
UNREACHABLE
Time for another overdue correction:

--- Unicoded subtitle files MUST be "little-endian"
{byte-order mark = FF FE};

Not at all. Vsfilter does not care whether the Unicode file is
little-endianed or big-endianed, as long as the first two bytes
of the file are a Byte-Order Mark. And now, at last,
appears the actual source of my current problems with subtiles: TextPad!
The damn text editor by default does *not* write the UTF-8/Unicode BOMs!
The solution I have found: I'm moving to Microsoft's Wordpad!
And here goes the question to the makers of TextPad (Helios Software):

What is the point of opening and saving in UTF-8 or Unicode
if the damn application cannot get rid of the short-sighted limitations of the
narrow-minded and language-segregating codepage scheme?

=================================

Quote
26th Jul 2007 13:24 #4
mats.hogberg

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2002

Location
Sweden (PAL)
I must ask:
Why?
Isn't the purpose of subtitles to translate one or many spoken languages into text in the language understood by the viewer?
So: One subtitle file - one language?

/Mats

Give a man a fish, and he'll eat for a day.
Teach a man to fish, and he'll eat for life.

Quote
26th Jul 2007 14:02 #5
Midzuki

View Profile

View Forum Posts
Banned

Join Date
Jun 2007

Location
UNREACHABLE
It seems there is some misunderstanding, I'm afraid :-(

Isn't the purpose of subtitles to translate one or many spoken languages into text in the language understood by the viewer?

Of course. Notwithstanding, if when you say that, you mean "subtitle tools that
fully-support UTF-8 and Unicode are necessarily useless", then...

I have no idea of what kind of software one'd better use when the goal is
produce subtitles in languages written in Devanagari or similar "exotic" charsets
for example, but I serious doubt you would be able to do it inside the preset
extended-ASCII codepages handled by TextSub and .SSA.

===

Quote
26th Jul 2007 15:00 #6
mats.hogberg

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2002

Location
Sweden (PAL)
Ah. Ok. I was under the impression your problem was not to create subtitles with different character sets, but to have multiple character sets in one subtitle file.

/Mats

Give a man a fish, and he'll eat for a day.
Teach a man to fish, and he'll eat for life.

Quote
26th Jul 2007 19:56 #7
Midzuki

View Profile

View Forum Posts
Banned

Join Date
Jun 2007

Location
UNREACHABLE
I was under the impression your problem was... to have multiple character sets in one subtitle file.

That was the problem of the OP that started the thread

https://forum.videohelp.com/topic333501.html

and Unicode was created exactly to allow anyone to represent correctly several languages within a same document (OR in a subtitle file, as he wanted).

Regards,

Midzuki.

===

Quote
17th Jul 2014 21:45 #8
nihenaita

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2014

Location
焦作/河南
I feel interesting that my browser took me here when I was trying to do some studies on the generate code 128 barcode in word. Anyway, this thread do attract my interests, sounds like a good place for me to get some leisure.

Quote
18th Jul 2014 02:42 #9
Baldrick

View Profile

View Forum Posts

Private Message

Visit Homepage
I'm a MEGA Super Moderator

Join Date
Aug 2000

Location
Sweden
Originally Posted by nihenaita

I feel interesting that my browser took me here when I was trying to do some studies on the generate code 128 barcode in word. Anyway, this thread do attract my interests, sounds like a good place for me to get some leisure.

What?

Quote
17th Jul 2015 02:17 #10
micagordon

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2015
Originally Posted by nihenaita

I feel interesting that my browser took me here when I was trying to do some studies on the generate code 128 barcode in word. Anyway, this thread do attract my interests, sounds like a good place for me to get some leisure.

Because code 128 encode ASCII Character Set and Midzuki mentioned ASCII. So, google may take you to this site.

Quote

Multilanguage support in TextSub and .SSA/.ASS

Thread Tools

Search Thread

Similar Threads

How to Convert .SSA/.ASS to .IDX with .SUB?

How to Hardsub Videos with ASS/SSA?

Problem with SSA/ASS hard subs

SSA/ASS capabilities

ass and ssa to sub/idx