You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dan Rich <is...@yahoo.com> on 2007/10/04 09:12:15 UTC

Scorer skipTo() expectations?

Hi, 

I have a custom Query class that provides a long list of lucene docIds (not for filtering purposes), which is one clause in a standard BooleanQuery (which also contains TermQuery instances).

I have a custom Scorer that goes along with the custom Query class. 

What (if any) document ordering requirements does the Scorer class have for its skipTo(int docId) method?

In particular, currently I'm sorting/returning the docIds in ascending order from my custom Query class. That can be expensive for large docId lists; is sorting necessary? It looks like skipTo() might expect the documents it gets to be in ascending order to behave correctly as part of a BooleanQuery, but I can't tell for sure from the doc.

If the document list from my custom Scorer class does not have its document list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses skipTo() potentially lose hits? If not, is there any performance concern with having the docIds unordered?


      ____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 

Re: Scorer skipTo() expectations?

Posted by Paul Elschot <pa...@xs4all.nl>.
Dan,

In Scorers, when skipTo() or next() returns true for the second or later
time, the result of doc() will be increased.
When Scorer.skipTo() does not have document order, documents will
be lost, which means that not all matching documents will be found
by the search.

For disjunctions (OR), one needs to merge the documents of
two Scorers using next() to iterate over the documents.
The merging is normally done on the fly using a specialized priority queue
on the doc() values in DisjunctionSumScorer.
No sorting of  complete document lists is done at search time,
that is done at indexing time. And since TermScorer uses the
index directly, it will always return documents in order.

The only exception to document ordering is BooleanScorer.next(),
which is used by BooleanQuery for some cases of top
level disjunctions, and then only when documents are allowed
to be scored out of order. The reason for that is performance,
BooleanScorer uses a faster data structure than a priority queue,
but BooleanScorer does not implement skipTo().

Regards,
Paul Elschot




On Thursday 04 October 2007 09:12, Dan Rich wrote:
> Hi,
>
> I have a custom Query class that provides a long list of lucene docIds (not
> for filtering purposes), which is one clause in a standard BooleanQuery
> (which also contains TermQuery instances).
>
> I have a custom Scorer that goes along with the custom Query class.
>
> What (if any) document ordering requirements does the Scorer class have for
> its skipTo(int docId) method?
>
> In particular, currently I'm sorting/returning the docIds in ascending
> order from my custom Query class. That can be expensive for large docId
> lists; is sorting necessary? It looks like skipTo() might expect the
> documents it gets to be in ascending order to behave correctly as part of a
> BooleanQuery, but I can't tell for sure from the doc.
>
> If the document list from my custom Scorer class does not have its document
> list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses
> skipTo() potentially lose hits? If not, is there any performance concern
> with having the docIds unordered?
>
>
>      
> ___________________________________________________________________________
>_________ Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s
> user panel and lay it on us.
> http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org