You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Panos Konstantinidis <gi...@yahoo.com> on 2008/02/09 01:59:12 UTC

recall/precision with lucene

Hello I am a new lucene user. I am trying to calculate the recall/precision of
a query and I was wondering if lucene provides an easy way to do it. 

Currently I have a number of documents that match a given query. Then I am
doing a search and I am getting back all the Hits. I then divide the number of
documents that came back from lucene (the Hits size) with the number of
documents that should have got. This is how I calculate the recall.

For precision I just get the hits.score() of each relevant document. I am not
sure if I am on the right track or if there is an easier/better way to do it. I
would appreciate any insigith into this.

Regards

Panos


      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: recall/precision with lucene

Posted by Paul Elschot <pa...@xs4all.nl>.
Op Saturday 09 February 2008 01:59:12 schreef Panos Konstantinidis:
> Hello I am a new lucene user. I am trying to calculate the recall/precision of
> a query and I was wondering if lucene provides an easy way to do it. 
> 
> Currently I have a number of documents that match a given query. Then I am
> doing a search and I am getting back all the Hits. I then divide the number of
> documents that came back from lucene (the Hits size) with the number of
> documents that should have got. This is how I calculate the recall.

Since you're going to use all hits for the query, it is normally better to avoid
Hits and use a HitCollector or a TopDocs.
 
> For precision I just get the hits.score() of each relevant document. I am not
> sure if I am on the right track or if there is an easier/better way to do it. I
> would appreciate any insigith into this.

To use the score value for precision one could define a cut off value for
the score value, but then the calculation for recall would also need to
be adapted. For this a HitCollector would be good.

In case you want the results sorted by decreasing score value have
a look at the search methods that return TopDocs. From this one
can make a precision/recall graph for the query by considering
the total results higher than a given score.

When a lot of such computations are needed, you may also want
to cache the values of a unique identifier field for all indexed docs,
have a look at FieldCache for this.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: recall/precision with lucene

Posted by Doron Cohen <cd...@gmail.com>.
Take a look at the quality package under contrib/benchmark.

Regards,
Doron

On Sat, Feb 9, 2008 at 2:59 AM, Panos Konstantinidis <gi...@yahoo.com>
wrote:

> Hello I am a new lucene user. I am trying to calculate the
> recall/precision of
> a query and I was wondering if lucene provides an easy way to do it.
>
> Currently I have a number of documents that match a given query. Then I am
> doing a search and I am getting back all the Hits. I then divide the
> number of
> documents that came back from lucene (the Hits size) with the number of
> documents that should have got. This is how I calculate the recall.
>
> For precision I just get the hits.score() of each relevant document. I am
> not
> sure if I am on the right track or if there is an easier/better way to do
> it. I
> would appreciate any insigith into this.
>
> Regards
>
> Panos
>