You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ching Zheng <zc...@gmail.com> on 2010/03/02 17:11:30 UTC

Help wanted with Indexing PDF Documents

Hi,
I have about 50 PDF douments with size of each is around 10MB. I am using
PDFbox for parsing, just wondering how I can index bookmarsk with its
corresponded page information?

I use PDDocumentOutline to get bookmark's title, but I only have
PDNamedDestination which offers no page number info. Can someone shed some
light on this? Thanks  a lot.

Re: Help wanted with Indexing PDF Documents

Posted by Ian Lea <ia...@gmail.com>.
Sounds like a question for the PDFBox mailing list.  Once you've got
the relevant info out of the PDF you can index it however you like.


--
Ian.

On Tue, Mar 2, 2010 at 4:11 PM, Ching Zheng <zc...@gmail.com> wrote:
> Hi,
> I have about 50 PDF douments with size of each is around 10MB. I am using
> PDFbox for parsing, just wondering how I can index bookmarsk with its
> corresponded page information?
>
> I use PDDocumentOutline to get bookmark's title, but I only have
> PDNamedDestination which offers no page number info. Can someone shed some
> light on this? Thanks  a lot.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org