VideoHelp Forum




+ Reply to Thread
Results 1 to 6 of 6
  1. Hello guys, I'm new here!

    I've got a general question about the encoding of subtitle files. I often use Japanese subtitles and had to adapt my VLC player to be able to display them in the first place. But one general problem remains that many subtitle files I find online, the text is not displayed correctly even if I open the file in Wordpad, and subsequently in VLC player.

    Case in point: these subtitles for X-Men: First Class

    The sample page indicates that the subtitles are indeed in Japanese, but once I download the file and open it, I get what you see in the attached image. I noticed that the file seems to be encoded as ANSI, so I thought saving it as UTF-8 might restore the Japanese characters, but without success.

    This is a problem I've had lots of times with subtitle files found on the web, so I wanted to ask you guys with more experience what's wrong here.
    Image Attached Thumbnails Click image for larger version

Name:	First Class.jpg
Views:	434
Size:	266.0 KB
ID:	29441  

    Quote Quote  
  2. DECEASED
    Join Date
    Jun 2009
    Location
    Heaven
    Search Comp PM
    That's not "ANSI", that's Shift_JIS. In order to (correctly) convert and save the file to UTF-8 or UTF-16, you must use a text editor that translates the Shift_JIS double-bytes to their respective Unicode codepoints EditPlus and EmEditor are two well-known decent text editors. You can give a try to JWPce as well
    Image Attached Thumbnails Click image for larger version

Name:	JWPce-demo.png
Views:	337
Size:	24.4 KB
ID:	29443  

    Image Attached Files
    Last edited by El Heggunte; 2nd Jan 2015 at 21:41. Reason: clarity
    Quote Quote  
  3. Interesting! Thanks a bunch for the explanation & already attaching the converted file!

    I tried to reproduce the process with EditPlus, but failed. Which software did you use? Was it JWPce as in the screenshot?

    Also, as none of the players I used was able to read the subtitle files in Shift_JIS encoding, do you happen to know why some files are being uploaded in that format?
    Last edited by Holofernes; 2nd Jan 2015 at 21:50. Reason: One sentence missing.
    Quote Quote  
  4. DECEASED
    Join Date
    Jun 2009
    Location
    Heaven
    Search Comp PM
    JWPce was used just for generating the screenshot.
    The conversion was done with EditPlus:

    1) open the original file
    2) change the screen font to a Japanese monospaced font (e.g., MS Gothic)
    3) reload the document as..., then select the appropriate encoding
    4) save as a Unicode file.

    do you happen to know why some files are being uploaded in that format?
    Because most people, regardless of their respective native tongues, solemnly ignore the purpose and usefulness of Unicode
    And the expression "most people" surely (and primarily) includes the designers of operating systems and of hardware, software developers, webmasters and webdesigners, the whole "IT folks", so to speak -.-

    Below is the mess that we've got because of the backward-compatibility with the obsolete and narrow-minded thing named 'ASCII':

    http://en.wikipedia.org/wiki/Binary-to-text_encoding#Encoding_standards

    http://en.wikipedia.org/wiki/UTF-8

    Code:
    Putting it simply, computer systems available in 2013 are squarely based on the
    limitations of 1975 hardware using paradigms and heuristics developed in 1956.
    Quote Quote  
  5. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    You can also use Subtitle Edit - "File" -> "Import subtitle with manual chosen encoding..." - try it
    It will suggest an encoding + show a preview:


    I fully agree with "El Heggunte" that Unicode should be used today - Unicode files normally have a BOM header which identifies them as e.g. UFT-8 so these files can opened correctly all over the world. Non Unicode files like ANSI relies on the current computers settings which is really bad.

    I like this article about text encoding.
    Quote Quote  
  6. DECEASED
    Join Date
    Jun 2009
    Location
    Heaven
    Search Comp PM
    Originally Posted by Nikse View Post
    http://www.smashingmagazine.com/2012/06/06/all-about-unicode-utf8-character-sets/
    Thanks for the URL.

    I must say that I totally disagree with the "UTF-8 To The Rescue" point-of-view...

    Originally Posted by all-about-unicode-utf8-character-sets
    Best of all it is backward compatible with ASCII. Unlike some of the other proposed solutions, any document written only in ASCII, using only characters 0-127, is perfectly valid UTF-8 as well – which saves bandwidth and hassle.
    ASCII must die, period.
    Why "save bandwidth" and "storage space" in the age of on-the-fly compression, broadband connections and terabyte-sized HDDs
    Oh, I see, easy money speaks louder.
    And why I still haven't seen a single page of C or C++ source-code written as a UTF-16 file
    Oh, I see, laziness and PSEUDO-"productivity" speak louder too -.-
    Quote Quote  
Visit our sponsor! Try DVDFab and backup Blu-rays!