VideoHelp Forum




+ Reply to Thread
Results 1 to 9 of 9
  1. Someone sent me part of a book in pdf format. (See book.pdf file at end of this post)
    You can clearly see that it is a photo (image) of the book by looking at the shading in the book fold and that the writing is slightly slanted. However, I was very surprised that I can highlight, copy/paste the text into Microsoft Word.
    How can this be? When I scan a book in the pdf image format, I cannot highlight, copy/paste the text. If I OCR with another application, then the shading/slanted writing is gone, but OCR errors are introduced.
    The above file looks like an image, but the text can be highlighted/copy/pasted. Can someone please explain how this can be?

    book.pdf
    Quote Quote  
  2. contrarian rallynavvie's Avatar
    Join Date
    Sep 2002
    Location
    Minnesotan in Texas
    Search Comp PM
    Good OCR software can do that just fine. Acrobat should be able to import and OCR that type of scan.
    FB-DIMM are the real cause of global warming
    Quote Quote  
  3. The text isn't graphics. It's slanted text overlaid onto images that look like a scanned book.
    Quote Quote  
  4. when you make your own scans, make sure to use "custom" and check the box for adobe acrobat pro to ocr the image. then it is highlightable/copyable.


    Quote Quote  
  5. It looks like it contains both invisible text overlaid onto graphics of the text. If you zoom way in and select words and/or letters you'll see that the selection doesn't always line up perfectly with the visible text.
    Quote Quote  
  6. Member
    Join Date
    Aug 2004
    Location
    Western Ma. United States
    Search Comp PM
    In Acrobat try this:

    Tools -> Comment & Markup -> Text Edits

    Highlight any word and type in a different word.
    You won't see the new word, but if you copy/paste
    to notepad (or such) the word you typed will be there.
    So as minidv2dvd stated it was scanned and OCR'd.
    It looks like the scan (image) and OCR data are
    separated in the pdf, you see the scanned imaged only.
    Also some of the scan was not OCR'd correctly,
    and a copy/paste shows garbage.
    The Second Amendment:
    AMERICA'S ORIGINAL
    HOMELAND SECURITY
    Quote Quote  
  7. OK, so when I look at the pdf, I'm looking at an image, but hidden behind the image is the text that can be highlighted/copied/pasted. Very interesting. Can you tell me what part was OCRed incorrectly? I can't find it.
    Quote Quote  
  8. Member
    Join Date
    Aug 2004
    Location
    Western Ma. United States
    Search Comp PM
    jimdagys:

    Had to really search this time, first time it was the 2nd line/word
    I checked. S--t luck I guess.


    Try page #160, the curser highlights between lines.
    Also in third line the "I" is the #1 --- "1 had left three cases in"

    Page #169 Par 2 Line 5 - "I .ncy's food cans were"

    jagabo please forgive me. Twice I've retraced your steps.

    I'm 70 and it takes awhile for this old mind to comprehend
    what people write,say, or do.

    george
    The Second Amendment:
    AMERICA'S ORIGINAL
    HOMELAND SECURITY
    Quote Quote  
  9. OK, Thanks. I just highlighted the whole page (instead of a few lines), copy and paste into Microsoft Word, and the few OCR problems are obvious. Learned something new.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!