You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2006/10/20 14:52:44 UTC

sorting in SolrIndexSearcher

I'm looking at the following code from SolrIndexSearcher.getDocListNC:

      final FieldSortedHitQueue hq = new FieldSortedHitQueue(reader,
lsort.getSort(), offset+len);
      searcher.search(query, new HitCollector() {
        public void collect(int doc, float score) {
          if (filt!=null && !filt.exists(doc)) return;
          numHits[0]++;
          hq.insert(new FieldDoc(doc, score));
        }
      }
      );
      totalHits = numHits[0];
      maxScore = totalHits>0 ? hq.getMaxScore() : 0.0f;
      nDocsReturned = hq.size();
      ids = new int[nDocsReturned];
      scores = (flags&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
      for (int i = nDocsReturned -1; i >= 0; i--) {
        FieldDoc fieldDoc = (FieldDoc)hq.pop();
        // fillFields is the point where score normalization happens
        // hq.fillFields(fieldDoc)
        ids[i] = fieldDoc.doc;
        if (scores != null) scores[i] = fieldDoc.score;
      }

Why are the document IDs and scores being retrieved from the
PriorityQueue in reverse order? I'm missing something obvious.

Thanks,
Peter

Re: sorting in SolrIndexSearcher

Posted by Peter Keegan <pe...@gmail.com>.
Aha. I thought Solr was doing things differently than Lucene, but now I see
the same thing in TopFieldDocCollector. Thanks Yonik.

Peter


On 10/20/06, Yonik Seeley <yo...@apache.org> wrote:
>
> On 10/20/06, Peter Keegan <pe...@gmail.com> wrote:
> > I'm looking at the following code from SolrIndexSearcher.getDocListNC:
> >
> >       final FieldSortedHitQueue hq = new FieldSortedHitQueue(reader,
> > lsort.getSort(), offset+len);
> >       searcher.search(query, new HitCollector() {
> >         public void collect(int doc, float score) {
> >           if (filt!=null && !filt.exists(doc)) return;
> >           numHits[0]++;
> >           hq.insert(new FieldDoc(doc, score));
> >         }
> >       }
> >       );
> >       totalHits = numHits[0];
> >       maxScore = totalHits>0 ? hq.getMaxScore() : 0.0f;
> >       nDocsReturned = hq.size();
> >       ids = new int[nDocsReturned];
> >       scores = (flags&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
> >       for (int i = nDocsReturned -1; i >= 0; i--) {
> >         FieldDoc fieldDoc = (FieldDoc)hq.pop();
> >         // fillFields is the point where score normalization happens
> >         // hq.fillFields(fieldDoc)
> >         ids[i] = fieldDoc.doc;
> >         if (scores != null) scores[i] = fieldDoc.score;
> >       }
> >
> > Why are the document IDs and scores being retrieved from the
> > PriorityQueue in reverse order? I'm missing something obvious.
>
> The PriorityQueue allows you to find the *smallest* element in it in
> log(N) time, not the largest, so we need to retrieve smallest to
> largest.  But since we want the highest score first, we traverse the
> array in reverse order, putting the smallest in the last position and
> the largest in the first.
>
> -Yonik
>

Re: sorting in SolrIndexSearcher

Posted by Yonik Seeley <yo...@apache.org>.
On 10/20/06, Peter Keegan <pe...@gmail.com> wrote:
> I'm looking at the following code from SolrIndexSearcher.getDocListNC:
>
>       final FieldSortedHitQueue hq = new FieldSortedHitQueue(reader,
> lsort.getSort(), offset+len);
>       searcher.search(query, new HitCollector() {
>         public void collect(int doc, float score) {
>           if (filt!=null && !filt.exists(doc)) return;
>           numHits[0]++;
>           hq.insert(new FieldDoc(doc, score));
>         }
>       }
>       );
>       totalHits = numHits[0];
>       maxScore = totalHits>0 ? hq.getMaxScore() : 0.0f;
>       nDocsReturned = hq.size();
>       ids = new int[nDocsReturned];
>       scores = (flags&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
>       for (int i = nDocsReturned -1; i >= 0; i--) {
>         FieldDoc fieldDoc = (FieldDoc)hq.pop();
>         // fillFields is the point where score normalization happens
>         // hq.fillFields(fieldDoc)
>         ids[i] = fieldDoc.doc;
>         if (scores != null) scores[i] = fieldDoc.score;
>       }
>
> Why are the document IDs and scores being retrieved from the
> PriorityQueue in reverse order? I'm missing something obvious.

The PriorityQueue allows you to find the *smallest* element in it in
log(N) time, not the largest, so we need to retrieve smallest to
largest.  But since we want the highest score first, we traverse the
array in reverse order, putting the smallest in the last position and
the largest in the first.

-Yonik