You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Brad Stallion <br...@yahoo.com> on 2013/03/18 12:15:18 UTC

Exclude text from invisible layouts

Hi all,

I've asked this on tika mailing list and I was told to ask to PDFBox team:

I'm extracting text from PDF files using my own sax handler. The problem is that I get both visible and invisible text, i.e. text contained in invisible parts of the layout.
How can I identify the invisible parts?

I've asked to stack overflow as well:

http://stackoverflow.com/questions/14956556/tika-and-invisible-text-from-pdf

Thanks a lot for your help!