VideoHelp Forum
+ Reply to Thread
Results 1 to 26 of 26
Thread
  1. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    EDIT: Solved thanks to Nikse's great program, SubtitleEdit.

    Hi, I am new here (I am noob, but I've been trying options and reading forums for days, and I can't find a solution).

    I am trying to learn how to convert the graphic subtitles of video recorded from DVB to a text based subtitles.
    The subtitles look like this on the video (much better actually, not blurry at all, VLC makes weird captures), with different colors to differentiate the speaker:



    I'd like to convert these subtitles into SRT in order to view the recorded program on other devices rather than the computer. A quick process would be perfect, since I will only watch the program and then erase it.

    Before I start explaining what I've tried so far, here is a sample (just 27mb) in case anyone want to try it: http://www.megaupload.com/?d=W82MAKEX (original recorded file is called "(grabación original) subs digitales.mpeg")

    What I've tried so far:
    -The recorded video is a MPEG TS file that includes all streams (video, two different audios -Spanish and English- and graphic Spanish subtitles)
    -I've demuxed all streams with ProjectX, trying several options for subtitles: SON+BMP, SUP and I also selected IDX+SUB.
    -I've opened the IDX+SUB with Subrip and Subresync, impossible to make a proper OCR:





    -Same for SUP file and DVDSubEdit 1.52, very poor quality for an OCR:



    -And about the SON+BMP, these are the best ones, BMP images are just perfect for OCR, and SON file has the time for each subtitle, BUT I haven't found any "SON+BMP to SRT" program...

    Here is a extract of the SON file:

    Code:
    SP_NUMBER	START		END		FILE_NAME
    Color		(0 1 2 3)
    Contrast	(0 2 7 11)
    Display_Area	(000 474 720 562)
    0000		00:00:02:16	00:00:05:07	subs digitales_st00000p4.bmp
    Color		(0 8 2 1)
    Contrast	(0 4 7 2)
    Display_Area	(000 426 720 558)
    0001		00:00:07:01	00:00:09:21	subs digitales_st00001p4.bmp
    Color		(0 0 1 2)
    Contrast	(0 0 2 7)
    Display_Area	(000 426 720 514)
    Here, a couple of BMP examples, crystal clear:



    I've tried an online free OCR on that image, and got this:



    Perfect recognition!

    So, I am looking either for a "SON+BMP to SRT" program or a way to extract proper and decent IDX+SUB or SUP files from the MPEG TS (My guess is that the colors of the subtitles are the problem).
    Any help is appreciated.
    Thanks in advance.
    Last edited by edea; 24th Oct 2011 at 16:18.
    Quote Quote  
  2. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Hi edia!

    You could try subtitle edit: http://subtitleedit.googlecode.com/files/SubtitleEdit32Setup.zip
    SE should be able to import+ocr both sub/idx and son/bmp... I would like to add support for importing subtitles directly from ts files, but that will be a later version.
    Quote Quote  
  3. Another avenue you can try is open the idx/sub in BDSup2Sub and export as ifo/sup. Load the ifo/sup in DVDSubEdit.
    Do automatic OCR and export as .srt. I've had good luck with English subs. I'm not sure if the upside down question marks will throw it off.

    Usually lines that don't auto OCR well with this method show up with one or more underscores. Sort of like this:
    ap__%%*&__7?_x

    If you search for underscore in the output and don't find any chances are it came out clean.
    As I say though, my only experience with this technique is using English Subs.
    http://milesaheadsoftware.org/
    Fully enabled freeware for Windows PCs.
    Quote Quote  
  4. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Originally Posted by Nikse View Post
    Hi edia!

    You could try subtitle edit: http://subtitleedit.googlecode.com/files/SubtitleEdit32Setup.zip
    SE should be able to import+ocr both sub/idx and son/bmp... I would like to add support for importing subtitles directly from ts files, but that will be a later version.
    Thanks! Your program looks great, but I can't open the SON subtitle, I get this error:

    Code:
    Consulte el final de este mensaje para obtener más detalles sobre cómo invocar a la depuración 
    Just-In-Time (JIT) en lugar de a este cuadro de diálogo.
    
    ************** Texto de la excepción **************
    System.ArgumentException: El parámetro no es válido.
       en System.Drawing.Bitmap.LockBits(Rectangle rect, ImageLockMode flags, PixelFormat format, BitmapData bitmapData)
       en System.Drawing.Bitmap.LockBits(Rectangle rect, ImageLockMode flags, PixelFormat format)
       en Nikse.SubtitleEdit.Logic.FastBitmap.LockImage()
       en Nikse.SubtitleEdit.Forms.VobSubOcr.GetSubtitleBitmap(Int32 index)
       en Nikse.SubtitleEdit.Forms.VobSubOcr.ShowSubtitleImage(Int32 index)
       en Nikse.SubtitleEdit.Forms.VobSubOcr.SubtitleListView1SelectedIndexChanged(Object sender, EventArgs e)
       en System.Windows.Forms.ListView.OnSelectedIndexChanged(EventArgs e)
       en System.Windows.Forms.ListView.WmReflectNotify(Message& m)
       en System.Windows.Forms.ListView.WndProc(Message& m)
       en System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
       en System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
       en System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
    
    
    ************** Ensamblados cargados **************
    mscorlib
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
    ----------------------------------------
    SubtitleEdit
        Versión del ensamblado: 3.2.0.33640
        Versión Win32: 3.2.0.33640
        Código base: file:///C:/Archivos%20de%20programa/Subtitle%20Edit/SubtitleEdit.exe
    ----------------------------------------
    System
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System/2.0.0.0__b77a5c561934e089/System.dll
    ----------------------------------------
    System.Windows.Forms
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms/2.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
    ----------------------------------------
    System.Drawing
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing/2.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
    ----------------------------------------
    System.Xml
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml/2.0.0.0__b77a5c561934e089/System.Xml.dll
    ----------------------------------------
    System.Windows.Forms.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms.resources/2.0.0.0_es_b77a5c561934e089/System.Windows.Forms.resources.dll
    ----------------------------------------
    System.XML.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml.resources/2.0.0.0_es_b77a5c561934e089/System.Xml.resources.dll
    ----------------------------------------
    mscorlib.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
    ----------------------------------------
    NHunspell
        Versión del ensamblado: 0.9.6.0
        Versión Win32: 0.9.6.0
        Código base: file:///C:/Archivos%20de%20programa/Subtitle%20Edit/NHunspell.DLL
    ----------------------------------------
    System.Drawing.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing.resources/2.0.0.0_es_b03f5f7f11d50a3a/System.Drawing.resources.dll
    ----------------------------------------
    
    ************** Depuración JIT **************
    Para habilitar la depuración Just In Time (JIT), el archivo de configuración de esta
    aplicación o equipo (machine.config) debe tener el
    valor jitDebugging establecido en la sección system.windows.forms.
    La aplicación también se debe compilar con la depuración
    habilitada
    
    Por ejemplo:
    
    <configuration>
        <system.windows.forms jitDebugging="true" />
    </configuration>
    
    Cuando esté habilitada la depuración JIT, cualquier excepción no controlada
    se enviará al depurador JIT registrado en el equipo
    en lugar de controlarlo mediante el cuadro de diálogo.
    Have you tried to open the SON file that I uploaded to megaupload? (link is in the first message)

    And if I open the SUB/IDX (the clearest that I've been able to extract, with ProjectX, selecting "UKFreeView", but not as clear as the BMP files), I get this after executing the OCR:



    Nothing is recognized! I must say I've never had good results with Tesseract. In Ubuntu, I only get almost perfect results with GOCR:

    yo@desktop:~$ gocr subsdigitales_st00001p4.bmp
    bmptoppm: Windows BMP, 720x132x8
    bmptoppm: WRITING PPM IMAGE
    Bueno, bueno, deja que te mire.
    Es que has...

    yo@desktop:~$ gocr subsdigitales_st00002p4.bmp
    bmptoppm: Windows BMP, 720x88x8
    bmptoppm: WRITING PPM IMAGE
    Pero _quė te ocurre?
    -Que me siento feliz.

    yo@desktop:~$ gocr subsdigitales_st00013p4.bmp
    bmptoppm: Windows BMP, 720x132x8
    bmptoppm: WRITING PPM IMAGE
    ...para molestar a tu abuelita.
    - _Sharon !

    yo@desktop:~$

    It only fails with tildes and inverted question and exclamation marks (á é í ó ú, Á É Í Ó Ú, ¿ ¡). It even recognizes "ñ".



    Originally Posted by MilesAhead View Post
    Another avenue you can try is open the idx/sub in BDSup2Sub and export as ifo/sup. Load the ifo/sup in DVDSubEdit.
    Do automatic OCR and export as .srt. I've had good luck with English subs. I'm not sure if the upside down question marks will throw it off.

    Usually lines that don't auto OCR well with this method show up with one or more underscores. Sort of like this:
    ap__%%*&__7?_x

    If you search for underscore in the output and don't find any chances are it came out clean.
    As I say though, my only experience with this technique is using English Subs.
    Thanks, I'll try it.
    Quote Quote  
  5. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Hi edea!

    Thx for testing

    I could not open the SON file with SE 3.2 at all...
    In order to use Tesseract you need to use "Spanish" tesseract dictionary for Spanish subtitles (and English for English subs).
    I've fixed the SON file reading + included Spanish dictionaries in this version: http://www.nikse.dk/SubtitleEdit.zip

    You could also use "Image compare" as ocr method (a bit like subrip)
    Quote Quote  
  6. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Thanks indeed for trying to fix it and adding the Spanish dictionaries. In this new version, I get the same error when I open the SON file:

    Code:
    Consulte el final de este mensaje para obtener más detalles sobre cómo invocar a la depuración 
    Just-In-Time (JIT) en lugar de a este cuadro de diálogo.
    
    ************** Texto de la excepción **************
    System.ArgumentException: El parámetro no es válido.
       en System.Drawing.Bitmap.LockBits(Rectangle rect, ImageLockMode flags, PixelFormat format, BitmapData bitmapData)
       en System.Drawing.Bitmap.LockBits(Rectangle rect, ImageLockMode flags, PixelFormat format)
       en Nikse.SubtitleEdit.Logic.FastBitmap.LockImage()
       en Nikse.SubtitleEdit.Forms.VobSubOcr.GetSubtitleBitmap(Int32 index)
       en Nikse.SubtitleEdit.Forms.VobSubOcr.ShowSubtitleImage(Int32 index)
       en Nikse.SubtitleEdit.Forms.VobSubOcr.SubtitleListView1SelectedIndexChanged(Object sender, EventArgs e)
       en System.Windows.Forms.ListView.OnSelectedIndexChanged(EventArgs e)
       en System.Windows.Forms.ListView.WmReflectNotify(Message& m)
       en System.Windows.Forms.ListView.WndProc(Message& m)
       en System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
       en System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
       en System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
    
    
    ************** Ensamblados cargados **************
    mscorlib
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
    ----------------------------------------
    SubtitleEdit
        Versión del ensamblado: 3.2.0.24454
        Versión Win32: 3.2.0.24454
        Código base: file:///D:/Documentos%20de%20Inma/Downloads/SubtitleEdit/SubtitleEdit.exe
    ----------------------------------------
    System.Windows.Forms
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms/2.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
    ----------------------------------------
    System
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System/2.0.0.0__b77a5c561934e089/System.dll
    ----------------------------------------
    System.Drawing
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing/2.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
    ----------------------------------------
    System.Xml
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml/2.0.0.0__b77a5c561934e089/System.Xml.dll
    ----------------------------------------
    System.Windows.Forms.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms.resources/2.0.0.0_es_b77a5c561934e089/System.Windows.Forms.resources.dll
    ----------------------------------------
    System.XML.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml.resources/2.0.0.0_es_b77a5c561934e089/System.Xml.resources.dll
    ----------------------------------------
    mscorlib.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
    ----------------------------------------
    System.Drawing.resources
        Versión del ensamblado: 2.0.0.0
        Versión Win32: 2.0.50727.3053 (netfxsp.050727-3000)
        Código base: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing.resources/2.0.0.0_es_b03f5f7f11d50a3a/System.Drawing.resources.dll
    ----------------------------------------
    
    ************** Depuración JIT **************
    Para habilitar la depuración Just In Time (JIT), el archivo de configuración de esta
    aplicación o equipo (machine.config) debe tener el
    valor jitDebugging establecido en la sección system.windows.forms.
    La aplicación también se debe compilar con la depuración
    habilitada
    
    Por ejemplo:
    
    <configuration>
        <system.windows.forms jitDebugging="true" />
    </configuration>
    
    Cuando esté habilitada la depuración JIT, cualquier excepción no controlada
    se enviará al depurador JIT registrado en el equipo
    en lugar de controlarlo mediante el cuadro de diálogo.
    Then I press "continue", but no image is shown, and every time I select a different subtitle, I get the same error.
    If I select "Spanish" and then start the OCR, I get another error: MSVCR.dll is not found, and can't continue.
    Quote Quote  
  7. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by edea View Post
    Thanks indeed for trying to fix it and adding the Spanish dictionaries. In this new version, I get the same error when I open the SON file...
    I cannot recreate this error with the uploaded son/bmp... any change you could upload the full son/bmp set?


    The Tesseract version included was not correct, this should be fixed: http://www.nikse.dk/SubtitleEdit.zip
    Quote Quote  
  8. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Sure, here they are, attached.


    By the way, I tried with the Spanish dictionary (on a SUP file):




    "Image compare does not detect the characters, only complete words...

    Any chance you could add GORC as another OCR method?
    Image Attached Files
    Last edited by edea; 14th Oct 2011 at 18:12.
    Quote Quote  
  9. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Hi edea!

    The uploaded file is the same as the first sample file (with spaces in file name)!
    The one SE crashes on must be another without spaces in file name, right (or wrong)?

    The uploaded SON works very well for me:
    Click image for larger version

Name:	SON-import.png
Views:	255
Size:	49.8 KB
ID:	9147

    >Any chance you could add GORC as another OCR method?
    Should be possible as it has a command line interface. Hm, I could not make GORC open any files - tried with png, bmp, and tif...
    Quote Quote  
  10. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Hi Nikse!
    I sent you an email with a video showing the error.
    I've just tried Subtitle Edit on a different computer, and it works! No error, and it works like a charm! (but I can't use my TV card here, since there is no antenna connection here)
    What could be the problem with the other computer? Could it be the .Net Framework version? I had to install .Net Framework 4 in the other computer because another subtitle program asked me to do it in order to work.
    Quote Quote  
  11. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Hi edea!

    Thx for the video/info!

    It looks like it's due to some limitation on WinXP and bitmaps, which should be fixed here I hope: http://www.nikse.dk/SubtitleEdit.zip
    (it will be a bit slower on WinXP - but should not crash)
    Quote Quote  
  12. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Thanks! I am going to test it now.
    But how did the other version work in the other computer? Both have Windows XP SP3
    Quote Quote  
  13. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by edea View Post
    Thanks! I am going to test it now.
    But how did the other version work in the other computer? Both have Windows XP SP3
    Perhaps you re-downloaded SE? I uploaded a version working a XP very early this morning...
    Quote Quote  
  14. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Could be...
    The only version that I tried on the laptop (the "other" computer) was build 19559, was the XP issue fixed in that version?

    Now I am trying to use SE in Ubuntu, but I still don't know how to open it (mono is installed, I think)

    Anyway, thank you very much for everything!
    Quote Quote  
  15. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by edea View Post
    Could be...
    The only version that I tried on the laptop (the "other" computer) was build 19559, was the XP issue fixed in that version?
    Could be... just go with latest version - 3.2.2

    Originally Posted by edea View Post
    Now I am trying to use SE in Ubuntu, but I still don't know how to open it (mono is installed, I think)
    Have you tried with "mono SubtitleEdit.exe" ?
    (check the readme file)
    Quote Quote  
  16. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Yes, that's is what I tried (I read the readme) but gave an error. I'll copy it later (now I am using Windows, I trying to cut the MPEG TS file)

    Edit: this is what I get:

    Code:
    yo@desktop:~/Descargas/SE32Linux$ mono SubtitleEdit.exe
    
    ** (SubtitleEdit.exe:14296): WARNING **: The following assembly referenced from /home/yo/Descargas/SE32Linux/SubtitleEdit.exe could not be loaded:
         Assembly:   System.Windows.Forms    (assemblyref_index=1)
         Version:    2.0.0.0
         Public Key: b77a5c561934e089
    The assembly was not found in the Global Assembly Cache, a path listed in the MONO_PATH environment variable, or in the location of the executing assembly (/home/yo/Descargas/SE32Linux/).
    
    
    ** (SubtitleEdit.exe:14296): WARNING **: Could not load file or assembly 'System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies.
    
    ** (SubtitleEdit.exe:14296): WARNING **: Missing method EnableVisualStyles in assembly /home/yo/Descargas/SE32Linux/SubtitleEdit.exe, type System.Windows.Forms.Application
    
    Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies.
    File name: 'System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'
    yo@desktop:~/Descargas/SE32Linux$
    Last edited by edea; 20th Oct 2011 at 15:23.
    Quote Quote  
  17. Hi Nikse!

    I'm testing you great SE program thanks to edea suggestion.
    I'm running SE v3.2.2, build 25663 in XP SP3, .NET 2.0 and all work fine for me.

    My old workflow was use suprip/subrip and after OpenOffice to spell check, now I can make the two task togheter thanks to SE.
    And the job is more easy thanks to new word/names stored in static files.
    But the best improve is the rules in eng_OCRFixReplaceList.xml, I'm making mi own spa_OCRFixReplaceList.xml.

    I have a question.
    Sometimes I get srt files, form others users, than I need spell check. Loading the file and clicking in 'Spell Check' the rules in xxx_OCRFixReplaceList.xml don't work.
    Is possible to add this option with something similar to xxx_OCRFixReplaceList.xml?

    Thanks.
    Quote Quote  
  18. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by tebasuna51 View Post
    My old workflow was use suprip/subrip and after OpenOffice to spell check, now I can make the two task togheter thanks to SE.
    And the job is more easy thanks to new word/names stored in static files.
    But the best improve is the rules in eng_OCRFixReplaceList.xml, I'm making mi own spa_OCRFixReplaceList.xml.
    Nice, perhaps you could email me your "spa_OCRFixReplaceList.xml" when you have added some words


    Originally Posted by tebasuna51 View Post
    Sometimes I get srt files, form others users, than I need spell check. Loading the file and clicking in 'Spell Check' the rules in xxx_OCRFixReplaceList.xml don't work.
    Is possible to add this option with something similar to xxx_OCRFixReplaceList.xml?
    "Tools -> Fix common errors -> Fix ocr errors" should do exactly this.
    Quote Quote  
  19. Thanks.
    Seems the language is recognized and with spanish subs the spa_OCRFixReplaceList.xml is used. OK.

    BTW, most of my problems with 'l' -> 'I' was solved changing the
    <WordPart from="l" to="i" />
    with
    <WordPart from="l" to="I" />
    if the change is inside (not at begining) a lowcase word a second pass solve the problem.

    What is the difference between <PartialWordsAlways> and <PartialWords>?
    The description is the same for both:
    <!-- Will be used to check words not in dictionary -->
    <!-- If new word(s) exists in spelling dictionary, it(they) is accepted -->

    Now I'm working with other tipical spanish problem: 'i' -> "¡" (begin of exclamation char)
    with other text editor I'm using Regular Expressions (lowercase i followed by a capital letter must be changed to '¡' followed by the same capital letter)
    but I don't know the Regular Expressions sintax (there are many) used by SE.

    Thanks for your help.
    Quote Quote  
  20. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by edea View Post
    Yes, that's is what I tried (I read the readme) but gave an error. I'll copy it later (now I am using Windows, I trying to cut the MPEG TS file)

    Edit: this is what I get:

    Code:
    yo@desktop:~/Descargas/SE32Linux$ mono SubtitleEdit.exe
    
    ** (SubtitleEdit.exe:14296): WARNING **: The following assembly referenced from /home/yo/Descargas/SE32Linux/SubtitleEdit.exe could not be loaded:
         Assembly:   System.Windows.Forms    
         Version:    2.0.0.0
    ...
    It looks like you're missing "System.Windows.Forms"
    You might need a newer version of Mono - or perhaps this can help: http://ubuntuforums.org/showthread.php?t=851578
    Quote Quote  
  21. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by tebasuna51 View Post
    Thanks.
    What is the difference between <PartialWordsAlways> and <PartialWords>?
    The description is the same for both:
    "PartialWordsAlways" is always replaced
    "PartialWords" is only replace if new word is correct spelled + longer than five characters.
    (I'll update the comments - thx)


    Originally Posted by tebasuna51 View Post
    Now I'm working with other tipical spanish problem: 'i' -> "¡" (begin of exclamation char)
    with other text editor I'm using Regular Expressions (lowercase i followed by a capital letter must be changed to '¡' followed by the same capital letter)
    but I don't know the Regular Expressions sintax (there are many) used by SE.
    Hm, in Edit -> Multi replace you can try this reg ex: \b(?<test>i)[A-Z]
    Also, try to "Tools -> Fix common errors - Fix Spanish question and exclamation marks".
    Quote Quote  
  22. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    Originally Posted by Nikse View Post
    It looks like you're missing "System.Windows.Forms"
    You might need a newer version of Mono - or perhaps this can help: http://ubuntuforums.org/showthread.php?t=851578
    Thank you Nikse, installing libmono-winforms2.0-cil did the trick and now SE opens, but I can't select any language... so no OCR...
    Quote Quote  
  23. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by edea View Post
    Originally Posted by Nikse View Post
    It looks like you're missing "System.Windows.Forms"
    You might need a newer version of Mono - or perhaps this can help: http://ubuntuforums.org/showthread.php?t=851578
    Thank you Nikse, installing libmono-winforms2.0-cil did the trick and now SE opens, but I can't select any language... so no OCR...
    Nice, you cannot select any... Tesseract language?
    How you installed Tesseract?
    Where is your "tessdata" folder located? /usr/share/tesseract-ocr/tessdata?
    Quote Quote  
  24. Member
    Join Date
    Oct 2011
    Location
    Spain
    Search Comp PM
    No, I can't select ANY language...
    Image Attached Thumbnails Click image for larger version

Name:	Pantallazo-17.png
Views:	159
Size:	12.6 KB
ID:	9317  

    Click image for larger version

Name:	Pantallazo-16.png
Views:	368
Size:	128.6 KB
ID:	9318  

    Quote Quote  
  25. Originally Posted by Nikse View Post
    Hm, in Edit -> Multi replace you can try this reg ex: \b(?<test>i)[A-Z]
    Also, try to "Tools -> Fix common errors - Fix Spanish question and exclamation marks".
    Thanks.

    "Fix Spanish question and exclamation marks" works with missing char, but don't replace the "i"

    \b(?<test>i)[A-Z¿] works fine thanks. (also added '¿')
    Quote Quote  
  26. Member
    Join Date
    Jul 2011
    Location
    Denmark
    Search Comp PM
    Originally Posted by edea View Post
    No, I can't select ANY language...
    Hi edea!

    I think you have tesseract 2.x... SE needs 3.x !
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!