You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Adam Nichols <Ad...@swmc.com> on 2009/05/23 00:21:14 UTC
Get page number for bookmark
I'm trying to parse PDF files, look at all the bookmarks, determine what
page they are on, and then split the document based on the bookmarks. For
now, I'm not worried about supporting name based bookmarks.
I can loop through the bookmarks with no problem (like the PrintBookmarks
example) but the PDOutlineItem objects all have a null Destination. I
looked at the PDF in a hex editor and read the PDF spec, and it looks like
this is because the bookmark doesn't have a "Dest", on the bookmark entry.
Instead it seems to refer to a "GoTo" action which then references page
"2 0 R".
Here's my bookmark:
102 0 obj
<</Parent 101 0 R/A 100 0 R/Next 104 0 R/Title(A002\r)>>
endobj
And the action which I believe it's referencing.
100 0 obj
<</D[2 0 R/FitH 795]/S/GoTo>>
endobj
If I dig into the COSDictionary, I can pull out the {2, 0} in a COSObject.
My problem now is determining which page number number that is; I don't
see anything in the library which allows me to do this. It seems strange
that nobody has even desired to get a page number from a bookmark, but I
checked the users mailing list, searched the mailing list archives and
spent about an hour on Google trying to track down a solution.
If this feature is missing, I have no problem adding it, but I'd need some
help in making sure I do it as efficiently as possible and put it in the
correct place in the library. Perhaps it'd be as easy as adding an
accessor for the page object id in the PDPage class?
Thanks,
Adam