You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Ali Husain <sm...@gmail.com> on 2016/09/26 23:44:40 UTC

Question about PDFBox parasing

Hello!

I'm new to PDFBox and I'm trying to extract inline images from a PDF
document.

I'm having trouble with an image that has many parts - here's the
breakdown. (Image is also attached)

[image: Inline image 1]

The XObject with 13 elements is actually one image. They are all different
components of the picture. I'm not able to maintain the order, instead I
get each image individually.

Has anyone had a similar problem? Is there a known solution?

Thank you,
Ali

Re: Question about PDFBox parasing

Posted by John Hewson <jo...@jahewson.com>.
> On 26 Sep 2016, at 16:44, Ali Husain <sm...@gmail.com> wrote:
> 
> Hello!
> 
> I'm new to PDFBox and I'm trying to extract inline images from a PDF document.
> 
> I'm having trouble with an image that has many parts - here's the breakdown. (Image is also attached)
> 
> <image.png>
> 
> The XObject with 13 elements is actually one image. They are all different components of the picture. I'm not able to maintain the order, instead I get each image individually.
> 
> Has anyone had a similar problem? Is there a known solution?

Take a look at the CustomGraphicsStreamEngine example:

https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/rendering/CustomGraphicsStreamEngine.java

You can subclass the image drawing methods if you want to know when/where specific images are drawn on the page. See the PageDrawer source code for how to calculate the image position via the current transformation matrix.

— John

> Thank you,
> Ali
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Question about PDFBox parasing

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 27.09.2016 um 01:44 schrieb Ali Husain:
> Hello!
>
> I'm new to PDFBox and I'm trying to extract inline images from a PDF 
> document.
>
> I'm having trouble with an image that has many parts - here's the 
> breakdown. (Image is also attached)
>
> Inline image 1
>
> The XObject with 13 elements is actually one image. They are all 
> different components of the picture. I'm not able to maintain the 
> order, instead I get each image individually.
>
> Has anyone had a similar problem? Is there a known solution?

I remember that there was a guy years ago with a similar problem in the 
JIRA issue tracker but I can't find it. There is no solution, i.e. the 
images don't have some easy properties how to put them together.
You can only render the PDF page as a whole and then cut out the image 
if you know where it is.

Most likely this is done on purpose to prevent you from doing what you 
want to do, or to make it expensive.

Tilman



>
> Thank you,
> Ali
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org