You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Brad Stallion <br...@yahoo.com> on 2013/03/18 12:15:18 UTC
Exclude text from invisible layouts
Hi all,
I've asked this on tika mailing list and I was told to ask to PDFBox team:
I'm extracting text from PDF files using my own sax handler. The problem is that I get both visible and invisible text, i.e. text contained in invisible parts of the layout.
How can I identify the invisible parts?
I've asked to stack overflow as well:
http://stackoverflow.com/questions/14956556/tika-and-invisible-text-from-pdf
Thanks a lot for your help!