You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Michael Howard <mi...@uforlife.com> on 2010/04/06 05:20:48 UTC

PrintImageLocations problems

I am working with some scanned .pdf documents, one image per page,
with OCR text behind the page image.
I need to extract the OCR text behind a user mouse selection of a rectangle.
I believe I can use the techniques of ExtractTextByArea, but I need to
scale from the image coordinates to the 72/inch PDF units for text.

When using the PrintImageLocations example I am getting
strange/unknown width & height.
Search of the pdfbox mail archive shows discussion of this problem
back in Dec 2009.
In the thread
  http://markmail.org/message/m5tcighpru2dccbu
Andreas Lehmkühler recommends using the technique used in
  http://svn.apache.org/repos/asf/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/operator/pagedrawer/Invoke.java
Unfortunately, this URL is currently broken.

Any assistance/pointers would be greatly appreciated.

Thanks,
Michael

Re: PrintImageLocations problems

Posted by Daniel Wilson <wi...@gmail.com>.
Here is the correction to that URL:
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/operator/pagedrawer/Invoke.java

There was a code reorganization recently that resulted in the change.

Daniel

On Mon, Apr 5, 2010 at 11:20 PM, Michael Howard <mi...@uforlife.com>wrote:

> I am working with some scanned .pdf documents, one image per page,
> with OCR text behind the page image.
> I need to extract the OCR text behind a user mouse selection of a
> rectangle.
> I believe I can use the techniques of ExtractTextByArea, but I need to
> scale from the image coordinates to the 72/inch PDF units for text.
>
> When using the PrintImageLocations example I am getting
> strange/unknown width & height.
> Search of the pdfbox mail archive shows discussion of this problem
> back in Dec 2009.
> In the thread
>  http://markmail.org/message/m5tcighpru2dccbu
> Andreas Lehmkühler recommends using the technique used in
>
> http://svn.apache.org/repos/asf/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/operator/pagedrawer/Invoke.java
> Unfortunately, this URL is currently broken.
>
> Any assistance/pointers would be greatly appreciated.
>
> Thanks,
> Michael
>