I could find another OCR program. Specifical for chinese OCR characters.
The discussion / suggestion follow below:
@lansing@Nikse555 (response):I've used the Microsoft Office Document Imaging for OCR on chinese traditional character with another subtitle software called IdxSubOcr, and the accuracy is close to 99%. However when I use the same option in subtitle edit, the accuracy is less than 5%, most of the time it didn't regconize the characters, and the process is VERY slow. I tried changing the image palette, but there's hardly any improvement.Original Source: http://forum.doom9.org/showthread.php?t=162721&page=7@lansing: I don't read chinese, so perhaps you can find and compare the source code with SE? Perhaps they do some image scaling or some other tricks?
I did not have MODI installed, but I do now - http://www.microsoft.com/en-us/download/details.aspx?id=21581 (customize setup and choose only modi)
If someone want to have a go at improving this, you can create a fork of the SE source code on GitHub: https://github.com/SubtitleEdit/subtitleedit/fork
See below what I could find for such program:
--------------------------------------------------------------------------------------------------------------------------------------------------------
Program name / description:
Features: One dedicated to OCR (Optical Character Recognition) Vobsub subtitle format software, the ability to idx / sub format convert srt subtitle formats and provides srt format proofing features. OCR engine for Microsoft Office 2003 brought the Microsoft Office Document Imaging (MODI), support English, Simplified Chinese, Traditional Chinese, Japanese.IdxSubOcr: An application to OCR Vobsub (idx/sub) files in Chinese, Japanese and English.
Motivation: There are already some free OCR software subtitles, English can be used Subresync, Chinese can be used SubOCR. But after using the software, I decided to develop IdxSubOcr, for the following reasons:
Hoping to improve ease of use aspects. OCR recognition rate Subresync comes with a great engine, but dozens of letters every time knocking too much trouble; and does not support Chinese, Japanese characters.
SubOCR too large and run error on some machines.
Description: The software is Chinese, Japanese, GBK character recognition result, it can only be used in support of GBK coding environment. General Windows 2000/XP no problem, Windows Me luck, Windows 98 probably not.
--------------------------------------------------------------------------------------------------------------------------------------------------------
---> http://www.comicer.com/stronghorse/software/html/IdxSubOcr.htm ( in chinese )
---> http://translate.google.com.br/translate?hl=pt-BR&sl=zh-CN&tl=en&prev=_dd&u=http%3A%2F...FIdxSubOcr.htm
---> http://translate.googleusercontent.com/translate_c?depth=1&hl=pt-BR&prev=_dd&rurl=tran...RNQTYxX6jjik4w
---> http://www.56.com/u43/v_NTgwMTUwODg.html ( video tutorial / in chinese )
---> http://forum.doom9.org/showthread.php?t=154536
---> http://en.wikipedia.org/wiki/Microsoft_Office_Document_Imaging
---> http://www.findthatzip-file.com/search-4898377-hZIP/winrar-winzip-download-idxsubocr.zip.htm (download)
---> http://www.comicer.com/stronghorse/software/exe/IdxSubOcr.zip ( download)
PS: for those who are curious about the program, scan the file before open it, using your updated anti-virus software engine. Just in case, you know!
Thanks.
devil (johner)
+ Reply to Thread
Results 1 to 2 of 2
-
Last edited by devilcoelhodog; 27th Feb 2014 at 06:13.
-
Interesting. Converting vobsubs for Chinese/Japanese is tedious work indeed. Most probably contributing to the fact why there are so few such subs laying around. An easy solution would be god sent. Looking forward to trying this out. I cannot speak Chinese though.
Thanks for posting devilcoelhodog.
##edit:
I had a quick chance to try it out on Japanese vobsubs, and I must say that I am quite impressed from the initial results. Though it does look like that most likely the subs need to be corrected manually afterwards. Probably depends on the subtitles, but from my limited experience the problem was:
- question marks were converted to the character "つ" or "っ"
- latin alphabet will inevitably go wrong as it is interpreted as japanese
- any kind of irregularities, such as smaller text on top of subtitles, vertical won't convert well
- obviously some kanji, even hiragana & katana will go wrong
Also I needed to change the colour like shown below. Otherwise all the Kanji will be all fckd up.
I'll do a little guide later when I have the time.Last edited by chiappa; 24th Aug 2014 at 22:52.
Similar Threads
-
Title Credits/End credits ... Character by Character
By tsk1979 in forum Newbie / General discussionsReplies: 9Last Post: 13th Nov 2013, 23:31 -
Chinese subtitles require Chinese font on the computer?
By Haopengyou in forum SubtitleReplies: 0Last Post: 24th Mar 2012, 11:04 -
Converting traditional Chinese subtitles to simplified
By Haopengyou in forum AudioReplies: 1Last Post: 20th Jun 2010, 10:40 -
Subtitles Tools (SUB to SRT 1.33) for OCR Chinese Txt to SRT
By dennis3377 in forum SubtitleReplies: 0Last Post: 10th Dec 2009, 00:46 -
Question about how to scan without image/OCR program
By jimdagys in forum ComputerReplies: 1Last Post: 9th Dec 2009, 23:33