You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Zhang, Lisheng" <Li...@broadvision.com> on 2007/05/23 18:46:06 UTC

How to avoid score calculation completely?

Hi,

We have been using lucene for years and it serves us well.

Sometimes when we issue a query, we only what to know
how many hits it leads, not want any docs back. Is it possible
to completely avoid score calculation to get total count back?

I understand score calculation needs a loop for all matched 
docs, can we avoid the loop, surely this is for performance. We
want to achieve getting total count at O(1), independent of the
number of Docs?

Thanks very much for helps, Lisheng

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to avoid score calculation completely?

Posted by Michael McCandless <lu...@mikemccandless.com>.
"Yonik Seeley" <yo...@apache.org> wrote:
> On 5/24/07, Ramana Jelda <ra...@ciao-group.com> wrote:
> > But I also see importance of ignoring score calculation.
> >
> > If you put it aside performance gain, is there any possibility to completely
> > ignore scoring calculation?
> 
> Yes, for unsorted results use a hit collector and no sorting will be
> done by score (or anything else).
> 
> You can also ignore the score by simply sorting on other fields.

I *think* something close to this would allow you to count the number
of docs matching a query without scoring:

  Scorer s = query.weight(searcher).scorer(reader);
  int count = 0;
  while(s.next()) {
    count++;
  }

I'm not certain that avoids all scoring work but at least for some of
the scorers it should save some CPU time; I'm not sure how much.  Also
note that the TopDocCollector (used by default if you don't provide
your own collector) does not count docs that have score <= 0.0, so the
above code fragment would overcount in such cases.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to avoid score calculation completely?

Posted by Yonik Seeley <yo...@apache.org>.
On 5/24/07, Ramana Jelda <ra...@ciao-group.com> wrote:
> But I also see importance of ignoring score calculation.
>
> If you put it aside performance gain, is there any possibility to completely
> ignore scoring calculation?

Yes, for unsorted results use a hit collector and no sorting will be
done by score (or anything else).

You can also ignore the score by simply sorting on other fields.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: How to avoid score calculation completely?

Posted by Ramana Jelda <ra...@ciao-group.com>.
But I also see importance of ignoring score calculation.

If you put it aside performance gain, is there any possibility to completely
ignore scoring calculation?

Jelda
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf 
> Of Yonik Seeley
> Sent: Wednesday, May 23, 2007 6:54 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to avoid score calculation completely?
> 
> On 5/23/07, Zhang, Lisheng <Li...@broadvision.com> wrote:
> > We have been using lucene for years and it serves us well.
> >
> > Sometimes when we issue a query, we only what to know how 
> many hits it 
> > leads, not want any docs back. Is it possible to completely avoid 
> > score calculation to get total count back?
> >
> > I understand score calculation needs a loop for all matched 
> docs, can 
> > we avoid the loop, surely this is for performance. We want 
> to achieve 
> > getting total count at O(1), independent of the number of Docs?
> 
> Calculating scores adds a low, fixed amount of overhead to 
> the matching logic.
> The savings would most likely not be that large.
> 
> For simple queries, it might be quickest to use TermDocs() to 
> iterate over the docs matching terms yourself.
> 
> Also, see Matcher in http://issues.apache.org/jira/browse/LUCENE-584
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to avoid score calculation completely?

Posted by Yonik Seeley <yo...@apache.org>.
On 5/23/07, Zhang, Lisheng <Li...@broadvision.com> wrote:
> We have been using lucene for years and it serves us well.
>
> Sometimes when we issue a query, we only what to know
> how many hits it leads, not want any docs back. Is it possible
> to completely avoid score calculation to get total count back?
>
> I understand score calculation needs a loop for all matched
> docs, can we avoid the loop, surely this is for performance. We
> want to achieve getting total count at O(1), independent of the
> number of Docs?

Calculating scores adds a low, fixed amount of overhead to the matching logic.
The savings would most likely not be that large.

For simple queries, it might be quickest to use TermDocs() to iterate
over the docs matching terms yourself.

Also, see Matcher in http://issues.apache.org/jira/browse/LUCENE-584

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org