You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Christopher Begley <ch...@outlook.com> on 2016/10/07 15:35:33 UTC

Dump all objects on page with coordinates (images, text, color boxes, lines)

Hello All!


New to PDFBox. My task to to basically map ALL elements on a page of a pdf document. This includes text, color boxes, highlights, underlines, lines, curves, images, etc.


Does there exist a way to dump all objects on a page and then retrieve information about each object? (Specifically, coordinates that can then be mapped to page coordinates in another file format).


From my limited perusal of the documentation, I don't see any obvious/intuitive way to do this. Can someone point me the right direction on how to approach this problem?


Thanks  in advance,

Re: Dump all objects on page with coordinates (images, text, color boxes, lines)

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.10.2016 um 17:35 schrieb Christopher Begley:
> Hello All!
>
>
> New to PDFBox. My task to to basically map ALL elements on a page of a pdf document. This includes text, color boxes, highlights, underlines, lines, curves, images, etc.
>
>
> Does there exist a way to dump all objects on a page and then retrieve information about each object? (Specifically, coordinates that can then be mapped to page coordinates in another file format).
>
>
>  From my limited perusal of the documentation, I don't see any obvious/intuitive way to do this. Can someone point me the right direction on how to approach this problem?

If you want "all", then our tools won't help because they're too narrow. 
Download the sources and run PDFDebugger, and trace 
PDFGraphicsStreamEngine and its parent class.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Dump all objects on page with coordinates (images, text, color boxes, lines)

Posted by Manuel Aristarán <ma...@jazzido.com>.
> On Oct 7, 2016, at 11:35 AM, Christopher Begley <ch...@outlook.com> wrote:
> […] My task to to basically map ALL elements on a page of a pdf document. This includes text, color boxes, highlights, underlines, lines, curves, images, etc.
> 
> Does there exist a way to dump all objects on a page and then retrieve information about each object? (Specifically, coordinates that can then be mapped to page coordinates in another file format).

Start by looking at PDFGraphicsStreamEngine. There are usage examples in the source tree.


--
Manuel Aristarán
http://jazzido.com