You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Adam Nichols <Ad...@swmc.com> on 2009/05/23 00:21:14 UTC

Get page number for bookmark

I'm trying to parse PDF files, look at all the bookmarks, determine what 
page they are on, and then split the document based on the bookmarks.  For 
now, I'm not worried about supporting name based bookmarks.

I can loop through the bookmarks with no problem (like the PrintBookmarks 
example) but the PDOutlineItem objects all have a null Destination.  I 
looked at the PDF in a hex editor and read the PDF spec, and it looks like 
this is because the bookmark doesn't have a "Dest", on the bookmark entry. 
 Instead it seems to refer to a "GoTo" action which then references page 
"2 0 R".

Here's my bookmark:
102 0 obj
<</Parent 101 0 R/A 100 0 R/Next 104 0 R/Title(A002\r)>>
endobj

And the action which I believe it's referencing. 
100 0 obj
<</D[2 0 R/FitH 795]/S/GoTo>>
endobj

If I dig into the COSDictionary, I can pull out the {2, 0} in a COSObject. 
 My problem now is determining which page number number that is; I don't 
see anything in the library which allows me to do this.  It seems strange 
that nobody has even desired to get a page number from a bookmark, but I 
checked the users mailing list, searched the mailing list archives and 
spent about an hour on Google trying to track down a solution.

If this feature is missing, I have no problem adding it, but I'd need some 
help in making sure I do it as efficiently as possible and put it in the 
correct place in the library.  Perhaps it'd be as easy as adding an 
accessor for the page object id in the PDPage class?

Thanks,
Adam