You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Tilman Hausherr <TH...@t-online.de> on 2016/09/25 10:31:04 UTC

Re: Newbie question about parsing PDFs

Am 25.09.2016 um 12:24 schrieb David Goodenough:
> I need to take a PDF document and extract each item of text with its
> position on the page.  PDFBox looks to be a good tool to use, but the
> examples are mainly to do with building PDFs rather than parsing them
> and the API is very rich (for which read large).
>
> Does anyone have any code they would be prepared to share that does
> this kind of parsing, or some pointers as to which classes I should
> be looking at?
Have a look at PrintTextLocations.java in the source download.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Newbie question about parsing PDFs

Posted by David Goodenough <da...@btconnect.com>.

On Sunday, 25 September 2016 12:31:04 BST Tilman Hausherr wrote:
> Am 25.09.2016 um 12:24 schrieb David Goodenough:
> > I need to take a PDF document and extract each item of text with its
> > position on the page.  PDFBox looks to be a good tool to use, but the
> > examples are mainly to do with building PDFs rather than parsing them
> > and the API is very rich (for which read large).
> > 
> > Does anyone have any code they would be prepared to share that does
> > this kind of parsing, or some pointers as to which classes I should
> > be looking at?
> 
> Have a look at PrintTextLocations.java in the source download.
> 
> Tilman
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
Wonderful, looks like exactly what I was looking for.

David