VideoHelp Forum
+ Reply to Thread
Results 1 to 1 of 1
Thread
  1. Member
    Join Date: Mar 2006
    Location: Brazil
    Search Comp PM
    I could find another OCR program. Specifical for chinese OCR characters.

    The discussion / suggestion follow below:

    @lansing
    I've used the Microsoft Office Document Imaging for OCR on chinese traditional character with another subtitle software called IdxSubOcr, and the accuracy is close to 99%. However when I use the same option in subtitle edit, the accuracy is less than 5%, most of the time it didn't regconize the characters, and the process is VERY slow. I tried changing the image palette, but there's hardly any improvement.
    @Nikse555 (response):
    @lansing: I don't read chinese, so perhaps you can find and compare the source code with SE? Perhaps they do some image scaling or some other tricks?
    I did not have MODI installed, but I do now - http://www.microsoft.com/en-us/download/details.aspx?id=21581 (customize setup and choose only modi)
    If someone want to have a go at improving this, you can create a fork of the SE source code on GitHub: https://github.com/SubtitleEdit/subtitleedit/fork
    Original Source: http://forum.doom9.org/showthread.php?t=162721&page=7


    See below what I could find for such program:


    --------------------------------------------------------------------------------------------------------------------------------------------------------

    Program name / description:

    IdxSubOcr: An application to OCR Vobsub (idx/sub) files in Chinese, Japanese and English.
    Features: One dedicated to OCR (Optical Character Recognition) Vobsub subtitle format software, the ability to idx / sub format convert srt subtitle formats and provides srt format proofing features. OCR engine for Microsoft Office 2003 brought the Microsoft Office Document Imaging (MODI), support English, Simplified Chinese, Traditional Chinese, Japanese.

    Motivation: There are already some free OCR software subtitles, English can be used Subresync, Chinese can be used SubOCR. But after using the software, I decided to develop IdxSubOcr, for the following reasons:

    Hoping to improve ease of use aspects. OCR recognition rate Subresync comes with a great engine, but dozens of letters every time knocking too much trouble; and does not support Chinese, Japanese characters.
    SubOCR too large and run error on some machines.

    Description: The software is Chinese, Japanese, GBK character recognition result, it can only be used in support of GBK coding environment. General Windows 2000/XP no problem, Windows Me luck, Windows 98 probably not.


    --------------------------------------------------------------------------------------------------------------------------------------------------------

    ---> http://www.comicer.com/stronghorse/software/html/IdxSubOcr.htm ( in chinese )

    ---> http://translate.google.com.br/translate?hl=pt-BR&sl=zh-CN&tl=en&prev=_dd&u=http%3A%2F...FIdxSubOcr.htm

    ---> http://translate.googleusercontent.com/translate_c?depth=1&hl=pt-BR&prev=_dd&rurl=tran...RNQTYxX6jjik4w

    ---> http://www.56.com/u43/v_NTgwMTUwODg.html ( video tutorial / in chinese )

    ---> http://forum.doom9.org/showthread.php?t=154536

    ---> http://en.wikipedia.org/wiki/Microsoft_Office_Document_Imaging

    ---> http://www.findthatzip-file.com/search-4898377-hZIP/winrar-winzip-download-idxsubocr.zip.htm (download)

    ---> http://www.comicer.com/stronghorse/software/exe/IdxSubOcr.zip ( download)


    PS: for those who are curious about the program, scan the file before open it, using your updated anti-virus software engine. Just in case, you know!

    Thanks.

    devil (johner)
    Last edited by devilcoelhodog; 27th Feb 2014 at 06:13.
    Quote Quote  



Similar Threads