You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by David Goodenough <da...@btconnect.com> on 2016/09/25 10:24:30 UTC

Newbie question about parsing PDFs

I need to take a PDF document and extract each item of text with its
position on the page.  PDFBox looks to be a good tool to use, but the
examples are mainly to do with building PDFs rather than parsing them
and the API is very rich (for which read large).

Does anyone have any code they would be prepared to share that does
this kind of parsing, or some pointers as to which classes I should
be looking at?

Thank you

David

Re: Newbie question about parsing PDFs

Posted by David Goodenough <da...@btconnect.com>.
On Sunday, 25 September 2016 12:31:04 BST Tilman Hausherr wrote:
> Am 25.09.2016 um 12:24 schrieb David Goodenough:
> > I need to take a PDF document and extract each item of text with its
> > position on the page.  PDFBox looks to be a good tool to use, but the
> > examples are mainly to do with building PDFs rather than parsing them
> > and the API is very rich (for which read large).
> > 
> > Does anyone have any code they would be prepared to share that does
> > this kind of parsing, or some pointers as to which classes I should
> > be looking at?
> 
> Have a look at PrintTextLocations.java in the source download.
> 
> Tilman
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
Wonderful, looks like exactly what I was looking for.

David


Re: Newbie question about parsing PDFs

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 25.09.2016 um 12:24 schrieb David Goodenough:
> I need to take a PDF document and extract each item of text with its
> position on the page.  PDFBox looks to be a good tool to use, but the
> examples are mainly to do with building PDFs rather than parsing them
> and the API is very rich (for which read large).
>
> Does anyone have any code they would be prepared to share that does
> this kind of parsing, or some pointers as to which classes I should
> be looking at?
Have a look at PrintTextLocations.java in the source download.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org