You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/04/04 03:56:29 UTC

Document scoring order?

Hi,

When Lucene scores matching documents, what is the order in which
documents are processed/scored and can that be changed?  I'm guessing
it scores matches in whichever order they are stored in the index/on
disk, which means by increasing docIDs?

I do see some out of order scoring is possible.... but can one visit
docs to score in, say, lexicographical order of a specific document
field?

Thanks,
Otis
--
Solr & ElasticSearch Support
http://sematext.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Document scoring order?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Otis,

they are generally processed in docId order. The special case "out-of-order" processing is only used for BooleanScorer1, in which the document IDs can be reported to the Collector out-of-order (because BooleanScorer scores documents in buckets). If you don’t allow out-of-order scoring, BooleanScorer2 is used. But this out-of-order processing is just a "may" (the scorer "may" process document in an undefined order), but only BS1 does this and there is no way to "define the order". You just mark your collector to be able to handle the out-of-docid-order case.

To change the docid order, you may need to re-sort your index to have the documents in the preferred order. See the recent features about index sorting. There is no other possibility to change the document order. Of curse then you have to use BS2 (don’t allow out-of-order scoring), to get documents in your preferred index order.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
> Sent: Thursday, April 04, 2013 3:56 AM
> To: java-user@lucene.apache.org
> Subject: Document scoring order?
> 
> Hi,
> 
> When Lucene scores matching documents, what is the order in which
> documents are processed/scored and can that be changed?  I'm guessing it
> scores matches in whichever order they are stored in the index/on disk,
> which means by increasing docIDs?
> 
> I do see some out of order scoring is possible.... but can one visit docs to
> score in, say, lexicographical order of a specific document field?
> 
> Thanks,
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Document scoring order?

Posted by Uwe Schindler <uw...@thetaphi.de>.
> Hi Otis,
> 
> It depends on the Scorer implementation.  The default iterates through
> matching documents by calling nextDoc(), which just moves along the
> postings lists in-order, but you could roll your own.  You're pretty constrained
> by the fact that the low-level DocIdSetIterators only move forward though.

Scorer extends DocIdSetIterator and is therefore required to work in docId order. The only allowed exemption from this is Scorer.score(Collector) method, which *may* be out of order. See my other mail.

> I'm experimenting with some out-of-order postings lists (for example, sorted
> by frequency) to allow early search termination for disjunction queries, but
> this has its own drawbacks - if postings lists for different terms are in
> different orders, then you can't use any Scorer that calls advance().
> 
> The other thing to look at would be sorted segments, see
> https://issues.apache.org/jira/browse/LUCENE-4752.
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 4 Apr 2013, at 02:56, Otis Gospodnetic wrote:
> 
> > Hi,
> >
> > When Lucene scores matching documents, what is the order in which
> > documents are processed/scored and can that be changed?  I'm guessing
> > it scores matches in whichever order they are stored in the index/on
> > disk, which means by increasing docIDs?
> >
> > I do see some out of order scoring is possible.... but can one visit
> > docs to score in, say, lexicographical order of a specific document
> > field?
> >
> > Thanks,
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Document scoring order?

Posted by Alan Woodward <al...@flax.co.uk>.
Hi Otis,

It depends on the Scorer implementation.  The default iterates through matching documents by calling nextDoc(), which just moves along the postings lists in-order, but you could roll your own.  You're pretty constrained by the fact that the low-level DocIdSetIterators only move forward though.

I'm experimenting with some out-of-order postings lists (for example, sorted by frequency) to allow early search termination for disjunction queries, but this has its own drawbacks - if postings lists for different terms are in different orders, then you can't use any Scorer that calls advance().

The other thing to look at would be sorted segments, see https://issues.apache.org/jira/browse/LUCENE-4752.

Alan Woodward
www.flax.co.uk


On 4 Apr 2013, at 02:56, Otis Gospodnetic wrote:

> Hi,
> 
> When Lucene scores matching documents, what is the order in which
> documents are processed/scored and can that be changed?  I'm guessing
> it scores matches in whichever order they are stored in the index/on
> disk, which means by increasing docIDs?
> 
> I do see some out of order scoring is possible.... but can one visit
> docs to score in, say, lexicographical order of a specific document
> field?
> 
> Thanks,
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>