You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Orit Prince <OR...@il.ibm.com> on 2021/12/19 08:32:27 UTC

extract graphics which is not an image

Hi

The first page at this PDF<https://s25.q4cdn.com/680186029/files/doc_financials/ar-interactive/2018-interactive/ar/images/Xcel_Energy-AR2018.pdf> displays white decorated text on top of an image.

When using the PDFBox utility PrintImageLocations<https://github.com/atsuoishimoto/pdfbox-ja/blob/master/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java>, this graphics is not extracted as an image, only the background image is extracted, without the white decorated text. When converting to Word doc, the decorated text is extracted as a shape with properties which can be modified, such as fill color, border color, and much more.

Is it possible to extract that shape from the PDF, using PDFBox? How?

This question posted also on Stackoverflow:  https://stackoverflow.com/questions/70409876/how-to-extract-graphics-which-is-not-an-image

Thanks!