You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Dorian Messina <d....@wavenet.be> on 2019/01/10 12:16:33 UTC

Question about a feature

Hello,
First : thank you for PDFBox and all the time you pass working on it, to make our dev lives easier.

I use for the first time the library and I have one < how to > question.
I need to remove all pictures from a selectable pdf (I can select the text with the mouse).
Solutions exist on stackoverflow https://stackoverflow.com/questions/6831194/how-can-i-remove-all-images-drawings-from-a-pdf-file-and-leave-text-only-in-java and elsewhere but the code is old and refers to nonexistent
methods nowadays. Indeed, I am not able to find this miraculous method :

resources.getImages().clear();

Does this feature still exist ? Is there a simple way to fullfill my objective ?

Thank you

Happy new year

Dorian Messina
Analyst-Developer
Mobile : +32 493 02 63 57
d.messina@wavenet.be<ma...@wavenet.be>

Wavenet
Rue de l'artisanat, 16
7900 Leuze-en-Hainaut | Belgique
Tel : +32 69 67 03 35
www.wavenet.be<http://www.wavenet.be/>




RE: Question about a feature

Posted by John Logan <jo...@texture.com>.
Hi Dorian,

 
I'd suggest starting with the RemoveAllText.java example to see the basic pattern for filtering items from the PDF token stream.

 
What should work is to adapt this example to remove the "Do" operator and operands where the corresponding PDXObject is an instance of PDImageXObject.

 
This will remove raster images but if you've got line art on the page, that will remain.

 
John

-----Original message-----
From: Dorian Messina
Sent: Thursday, January 10 2019, 5:41 am
To: users@pdfbox.apache.org
Subject: Question about a feature
 
Hello,
First : thank you for PDFBox and all the time you pass working on it, to make our dev lives easier.

I use for the first time the library and I have one < how to > question.

I need to remove all pictures from a selectable pdf (I can select the text with the mouse).
Solutions exist on stackoverflow https://stackoverflow.com/questions/6831194/how-can-i-remove-all-images-drawings-from-a-pdf-file-and-leave-text-only-in-java and elsewhere but the code is old and refers to nonexistent
methods nowadays. Indeed, I am not able to find this miraculous method :


resources.getImages().clear();

Does this feature still exist ? Is there a simple way to fullfill my objective ?

Thank you

Happy new year

Dorian Messina
Analyst-Developer
Mobile : +32 493 02 63 57
d.messina@wavenet.be <ma...@wavenet.be> <mailto:d.messina@wavenet.be <ma...@wavenet.be> >

Wavenet
Rue de l'artisanat, 16
7900 Leuze-en-Hainaut | Belgique
Tel : +32 69 67 03 35
www.wavenet.be <http://www.wavenet.be> <http://www.wavenet.be <http://www.wavenet.be> />