You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ned Regina <nr...@ergito.com> on 2002/07/19 15:54:49 UTC

Locating term in search results question

I need to locate a term in the text field of a document returned in a
search result.  I'm using regular expressions, but they're not always
accurate, and Lucene doesn't seem to index positional information.
Optimally, I could use the same algorithm that matches documents in an
index, but I don't know how to go about doing that.  Also, I'm
concerned that searching the search results could get horribly
processor-intensive.

For Example:

I've got a document with a field containing the following text:

"A chicken makes a lousy house pet."

When a user runs a search for "chicken", I'd like to be able to
accurately locate it within the results in order to highlight it.
This is a simple example that could easily be handled by a regular
expression.

However, if I've got the following text:

"Cytotoxic T cells (also known as killer T cells) possess the capacity
to lyse an infected target cell."

It becomes more difficult to accurately locate the term which caused
the match if the search text was "T cell".  The regular expressions
begin to get more and more complicated with a higher probability of
inaccuracy.  In this case, you would have to be sure not to match "t
cell" in "target cell".

If Lucene had a facility to determine the position of a term in the
text, it would be much easier to highlight.  Any suggestions would be
great.  Thanks.


Ned Regina
www.ergito.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Locating term in search results question

Posted by Ned Regina <nr...@ergito.com>.
This has put me on the right track.  Thank you very much for the
lightning-quick response.

-Ned



At 10:12 AM 7/19/2002, you wrote:
>Check
>http://www.iq-computing.de/lucene/highlight.htm
>
>-Rakesh.
>
>
>----- Original Message -----
>From: "Ned Regina" <nr...@ergito.com>
>To: <lu...@jakarta.apache.org>
>Sent: Friday, July 19, 2002 7:24 PM
>Subject: Locating term in search results question
>
>
> > I need to locate a term in the text field of a document returned in a
> > search result.  I'm using regular expressions, but they're not always
> > accurate, and Lucene doesn't seem to index positional information.
> > Optimally, I could use the same algorithm that matches documents in an
> > index, but I don't know how to go about doing that.  Also, I'm
> > concerned that searching the search results could get horribly
> > processor-intensive.
> >
> > For Example:
> >
> > I've got a document with a field containing the following text:
> >
> > "A chicken makes a lousy house pet."
> >
> > When a user runs a search for "chicken", I'd like to be able to
> > accurately locate it within the results in order to highlight it.
> > This is a simple example that could easily be handled by a regular
> > expression.
> >
> > However, if I've got the following text:
> >
> > "Cytotoxic T cells (also known as killer T cells) possess the capacity
> > to lyse an infected target cell."
> >
> > It becomes more difficult to accurately locate the term which caused
> > the match if the search text was "T cell".  The regular expressions
> > begin to get more and more complicated with a higher probability of
> > inaccuracy.  In this case, you would have to be sure not to match "t
> > cell" in "target cell".
> >
> > If Lucene had a facility to determine the position of a term in the
> > text, it would be much easier to highlight.  Any suggestions would be
> > great.  Thanks.
> >
> >
> > Ned Regina
> > www.ergito.com
> >
> >
> > --
> > To unsubscribe, e-mail:
><ma...@jakarta.apache.org>
> > For additional commands, e-mail:
><ma...@jakarta.apache.org>
> >
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Locating term in search results question

Posted by Rakesh Ayilliath <ay...@hotmail.com>.
Check
http://www.iq-computing.de/lucene/highlight.htm

-Rakesh.


----- Original Message -----
From: "Ned Regina" <nr...@ergito.com>
To: <lu...@jakarta.apache.org>
Sent: Friday, July 19, 2002 7:24 PM
Subject: Locating term in search results question


> I need to locate a term in the text field of a document returned in a
> search result.  I'm using regular expressions, but they're not always
> accurate, and Lucene doesn't seem to index positional information.
> Optimally, I could use the same algorithm that matches documents in an
> index, but I don't know how to go about doing that.  Also, I'm
> concerned that searching the search results could get horribly
> processor-intensive.
>
> For Example:
>
> I've got a document with a field containing the following text:
>
> "A chicken makes a lousy house pet."
>
> When a user runs a search for "chicken", I'd like to be able to
> accurately locate it within the results in order to highlight it.
> This is a simple example that could easily be handled by a regular
> expression.
>
> However, if I've got the following text:
>
> "Cytotoxic T cells (also known as killer T cells) possess the capacity
> to lyse an infected target cell."
>
> It becomes more difficult to accurately locate the term which caused
> the match if the search text was "T cell".  The regular expressions
> begin to get more and more complicated with a higher probability of
> inaccuracy.  In this case, you would have to be sure not to match "t
> cell" in "target cell".
>
> If Lucene had a facility to determine the position of a term in the
> text, it would be much easier to highlight.  Any suggestions would be
> great.  Thanks.
>
>
> Ned Regina
> www.ergito.com
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>