VideoHelp Forum
+ Reply to Thread
Results 1 to 2 of 2
Thread
  1. I could find another OCR program. Specifical for chinese OCR characters.

    The discussion / suggestion follow below:

    @lansing
    I've used the Microsoft Office Document Imaging for OCR on chinese traditional character with another subtitle software called IdxSubOcr, and the accuracy is close to 99%. However when I use the same option in subtitle edit, the accuracy is less than 5%, most of the time it didn't regconize the characters, and the process is VERY slow. I tried changing the image palette, but there's hardly any improvement.
    @Nikse555 (response):
    @lansing: I don't read chinese, so perhaps you can find and compare the source code with SE? Perhaps they do some image scaling or some other tricks?
    I did not have MODI installed, but I do now - http://www.microsoft.com/en-us/download/details.aspx?id=21581 (customize setup and choose only modi)
    If someone want to have a go at improving this, you can create a fork of the SE source code on GitHub: https://github.com/SubtitleEdit/subtitleedit/fork
    Original Source: http://forum.doom9.org/showthread.php?t=162721&page=7


    See below what I could find for such program:


    --------------------------------------------------------------------------------------------------------------------------------------------------------

    Program name / description:

    IdxSubOcr: An application to OCR Vobsub (idx/sub) files in Chinese, Japanese and English.
    Features: One dedicated to OCR (Optical Character Recognition) Vobsub subtitle format software, the ability to idx / sub format convert srt subtitle formats and provides srt format proofing features. OCR engine for Microsoft Office 2003 brought the Microsoft Office Document Imaging (MODI), support English, Simplified Chinese, Traditional Chinese, Japanese.

    Motivation: There are already some free OCR software subtitles, English can be used Subresync, Chinese can be used SubOCR. But after using the software, I decided to develop IdxSubOcr, for the following reasons:

    Hoping to improve ease of use aspects. OCR recognition rate Subresync comes with a great engine, but dozens of letters every time knocking too much trouble; and does not support Chinese, Japanese characters.
    SubOCR too large and run error on some machines.

    Description: The software is Chinese, Japanese, GBK character recognition result, it can only be used in support of GBK coding environment. General Windows 2000/XP no problem, Windows Me luck, Windows 98 probably not.


    --------------------------------------------------------------------------------------------------------------------------------------------------------

    ---> http://www.comicer.com/stronghorse/software/html/IdxSubOcr.htm ( in chinese )

    ---> http://translate.google.com.br/translate?hl=pt-BR&sl=zh-CN&tl=en&prev=_dd&u=http%3A%2F...FIdxSubOcr.htm

    ---> http://translate.googleusercontent.com/translate_c?depth=1&hl=pt-BR&prev=_dd&rurl=tran...RNQTYxX6jjik4w

    ---> http://www.56.com/u43/v_NTgwMTUwODg.html ( video tutorial / in chinese )

    ---> http://forum.doom9.org/showthread.php?t=154536

    ---> http://en.wikipedia.org/wiki/Microsoft_Office_Document_Imaging

    ---> http://www.findthatzip-file.com/search-4898377-hZIP/winrar-winzip-download-idxsubocr.zip.htm (download)

    ---> http://www.comicer.com/stronghorse/software/exe/IdxSubOcr.zip ( download)


    PS: for those who are curious about the program, scan the file before open it, using your updated anti-virus software engine. Just in case, you know!

    Thanks.

    devil (johner)
    Last edited by devilcoelhodog; 27th Feb 2014 at 07:13.
    Quote Quote  
  2. Interesting. Converting vobsubs for Chinese/Japanese is tedious work indeed. Most probably contributing to the fact why there are so few such subs laying around. An easy solution would be god sent. Looking forward to trying this out. I cannot speak Chinese though.

    Thanks for posting devilcoelhodog.

    ##edit:

    I had a quick chance to try it out on Japanese vobsubs, and I must say that I am quite impressed from the initial results. Though it does look like that most likely the subs need to be corrected manually afterwards. Probably depends on the subtitles, but from my limited experience the problem was:

    - question marks were converted to the character "つ" or "っ"
    - latin alphabet will inevitably go wrong as it is interpreted as japanese
    - any kind of irregularities, such as smaller text on top of subtitles, vertical won't convert well
    - obviously some kanji, even hiragana & katana will go wrong

    Also I needed to change the colour like shown below. Otherwise all the Kanji will be all fckd up.

    Click image for larger version

Name:	ocr-color-select.jpg
Views:	1005
Size:	47.8 KB
ID:	27078

    I'll do a little guide later when I have the time.
    Last edited by chiappa; 24th Aug 2014 at 23:52.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!