You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gong Li <ee...@gmail.com> on 2011/02/18 19:29:59 UTC

Lucene: If I have picture, table, or somthing others in the PDF

Hi,

I am developing a PDF search engine, locally. I have used API: pdfbox and
lucene.

I must show the user the PDF page containing the keywords(if highlight, it's
great) and sort by relevance(default in lucene). HOW???

Maybe, if there are some pictures in the PDF page, how could it display to
the user after index and search the extracted text???

Thanks

Re: Lucene: If I have picture, table, or somthing others in the PDF

Posted by Simon Willnauer <si...@googlemail.com>.
hi Gong Li,

your question is out of scope of this mailing list.

thanks,

simon

On Fri, Feb 18, 2011 at 7:29 PM, Gong Li <ee...@gmail.com> wrote:
> Hi,
>
> I am developing a PDF search engine, locally. I have used API: pdfbox and
> lucene.
>
> I must show the user the PDF page containing the keywords(if highlight, it's
> great) and sort by relevance(default in lucene). HOW???
>
> Maybe, if there are some pictures in the PDF page, how could it display to
> the user after index and search the extracted text???
>
> Thanks
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene: If I have picture, table, or somthing others in the PDF

Posted by Alexander Aristov <al...@gmail.com>.
your search engine would extract text content from a PDF file and all
markup, pictures etc would be lost. and so when you search you would get
only text, highlighted or not.


Best Regards
Alexander Aristov


On 18 February 2011 21:29, Gong Li <ee...@gmail.com> wrote:

> Hi,
>
> I am developing a PDF search engine, locally. I have used API: pdfbox and
> lucene.
>
> I must show the user the PDF page containing the keywords(if highlight,
> it's
> great) and sort by relevance(default in lucene). HOW???
>
> Maybe, if there are some pictures in the PDF page, how could it display to
> the user after index and search the extracted text???
>
> Thanks
>