You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Shriram <sh...@yahoo.com> on 2012/03/06 08:29:42 UTC

Extracting text between two bookmarks using Apache PdfBox

I am using Apache PDFBox to read a PDF document which has a hierarchy, which is defined by the bookmarks. The hierarchy is in a tree form with contents only at the leaf level. When I try to extract the text between two leaf level bookmarks(using Stripper.setStartBookmark(), Stripper.setEndBookmark() and Stripper.writeText()), I get the text in the whole page instead. In short, my problem is similar to that mentioned in http://www.java-forums.org/advanced-java/51032-pdox-1-6-0-extract-text-between-2-bookmarks-same-page-sos.html
Is there a way to extract the contents between two bookmarks? If so, what should be the change in my code?

Extracting text between two bookmarks using Apache PdfBox

Posted by Shriram <sh...@yahoo.com>.
I am using Apache PDFBox to read a PDF document which has a hierarchy, which is defined by the bookmarks. The hierarchy is in a tree form with contents only at the leaf level. When I try to extract the text between two leaf level bookmarks(using Stripper.setStartBookmark(), Stripper.setEndBookmark() and Stripper.writeText()), I get the text in the whole page instead. In short, my problem is similar to that mentioned in http://www.java-forums.org/advanced-java/51032-pdox-1-6-0-extract-text-between-2-bookmarks-same-page-sos.html

Is there a way to extract the contents between two bookmarks? If so, what should be the change in my code?