Someone sent me part of a book in pdf format. (See book.pdf file at end of this post)
You can clearly see that it is a photo (image) of the book by looking at the shading in the book fold and that the writing is slightly slanted. However, I was very surprised that I can highlight, copy/paste the text into Microsoft Word.
How can this be? When I scan a book in the pdf image format, I cannot highlight, copy/paste the text. If I OCR with another application, then the shading/slanted writing is gone, but OCR errors are introduced.
The above file looks like an image, but the text can be highlighted/copy/pasted. Can someone please explain how this can be?
book.pdf
+ Reply to Thread
Results 1 to 9 of 9
-
-
Good OCR software can do that just fine. Acrobat should be able to import and OCR that type of scan.
FB-DIMM are the real cause of global warming -
The text isn't graphics. It's slanted text overlaid onto images that look like a scanned book.
-
when you make your own scans, make sure to use "custom" and check the box for adobe acrobat pro to ocr the image. then it is highlightable/copyable.
-
It looks like it contains both invisible text overlaid onto graphics of the text. If you zoom way in and select words and/or letters you'll see that the selection doesn't always line up perfectly with the visible text.
-
In Acrobat try this:
Tools -> Comment & Markup -> Text Edits
Highlight any word and type in a different word.
You won't see the new word, but if you copy/paste
to notepad (or such) the word you typed will be there.
So as minidv2dvd stated it was scanned and OCR'd.
It looks like the scan (image) and OCR data are
separated in the pdf, you see the scanned imaged only.
Also some of the scan was not OCR'd correctly,
and a copy/paste shows garbage.The Second Amendment:
AMERICA'S ORIGINAL
HOMELAND SECURITY -
OK, so when I look at the pdf, I'm looking at an image, but hidden behind the image is the text that can be highlighted/copied/pasted. Very interesting. Can you tell me what part was OCRed incorrectly? I can't find it.
-
jimdagys:
Had to really search this time, first time it was the 2nd line/word
I checked. S--t luck I guess.
Try page #160, the curser highlights between lines.
Also in third line the "I" is the #1 --- "1 had left three cases in"
Page #169 Par 2 Line 5 - "I .ncy's food cans were"
jagabo please forgive me. Twice I've retraced your steps.
I'm 70 and it takes awhile for this old mind to comprehend
what people write,say, or do.
georgeThe Second Amendment:
AMERICA'S ORIGINAL
HOMELAND SECURITY -
OK, Thanks. I just highlighted the whole page (instead of a few lines), copy and paste into Microsoft Word, and the few OCR problems are obvious. Learned something new.
Similar Threads
-
eBooks to .pdf?
By rotuts in forum MacReplies: 8Last Post: 23rd Oct 2010, 02:59 -
Question about how to scan without image/OCR program
By jimdagys in forum ComputerReplies: 1Last Post: 9th Dec 2009, 23:33 -
Question about how to scan a book into the computer
By jimdagys in forum ComputerReplies: 7Last Post: 28th Oct 2009, 10:45 -
Splitting .pdf files question
By antipete in forum Newbie / General discussionsReplies: 4Last Post: 5th Jan 2009, 01:23 -
How to convert E-Book in PDF Format to VCD or DVD
By permataharahap in forum Newbie / General discussionsReplies: 2Last Post: 29th Jul 2008, 00:00