You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gong Li <ee...@gmail.com> on 2011/02/03 18:31:29 UTC

Fwd: Lucene Problems

Hi,

I am developing an advanced pdf search engine in java by using pdfbox and
lucene. And I must display the context of each keyword in the user
interface, but i cannot find a method to do so. Most of the methods provided
are used to deal with documents with whole content in the specified field,
and i just need the context of each keyword (i.e. some specified part of the
contents in the specified field).

For example, the contents in the specified field is "... aaa aaa bbb aaa aaa
aaa aaa ..." and search for "bbb". And I want to display "aaa aaa *bbb* aaa
aaa" in the UI. How?

Are there any ways to do so???

Thx.

Cescy

Re: Fwd: Lucene Problems

Posted by Dawn Zoƫ Raison <da...@digitorial.co.uk>.
We use the contrib package 'Highlighter' to do exactly that on our PDF 
newspaper website.

Dawn

On 03/02/2011 17:31, Gong Li wrote:
> Hi,
>
> I am developing an advanced pdf search engine in java by using pdfbox and
> lucene. And I must display the context of each keyword in the user
> interface, but i cannot find a method to do so. Most of the methods provided
> are used to deal with documents with whole content in the specified field,
> and i just need the context of each keyword (i.e. some specified part of the
> contents in the specified field).
>
> For example, the contents in the specified field is "... aaa aaa bbb aaa aaa
> aaa aaa ..." and search for "bbb". And I want to display "aaa aaa *bbb* aaa
> aaa" in the UI. How?
>
> Are there any ways to do so???
>
> Thx.
>
> Cescy
>

-- 

Rgds.
*Dawn Raison*
Technical Director, Digitorial Ltd.

E:dawn@digitorial.co.uk	W:http://www.digitorial.co.uk
M: 07956 609 618	        T: 01428 729 431
Reg: 04644583, England&  Wales
Church Villas Ecchinswell, Newbury, RG20  4TT

This email and any attached files are for the exclusive use of the 
addressee and may contain privileged and/or confidential information. If 
you receive this email in error you should not disclose the contents to 
any other person nor take copies but should delete it immediately. 
Digitorial Ltd makes no warranty as to the accuracy or completeness of 
this email and accepts no liability for its contents or use. Any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Digitorial Ltd.


Re: Lucene Problems

Posted by Ian Lea <ia...@gmail.com>.
The Lucene highlighter sounds just what you need.
http://hrycan.com/2009/10/25/lucene-highlighter-howto/ talks about
using it on an index of PDFs.  Google will find lots of other info.


--
Ian.


On Thu, Feb 3, 2011 at 5:31 PM, Gong Li <ee...@gmail.com> wrote:
> Hi,
>
> I am developing an advanced pdf search engine in java by using pdfbox and
> lucene. And I must display the context of each keyword in the user
> interface, but i cannot find a method to do so. Most of the methods provided
> are used to deal with documents with whole content in the specified field,
> and i just need the context of each keyword (i.e. some specified part of the
> contents in the specified field).
>
> For example, the contents in the specified field is "... aaa aaa bbb aaa aaa
> aaa aaa ..." and search for "bbb". And I want to display "aaa aaa *bbb* aaa
> aaa" in the UI. How?
>
> Are there any ways to do so???
>
> Thx.
>
> Cescy
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org