You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2009/03/30 09:50:30 UTC

Bug in TopFieldCollector?

Hi

As I prepared the patch for 1575, I noticed a strange implementation in
TopFieldCollector's topDocs():

    ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
    if (fillFields) {
      for (int i = queue.size() - 1; i >= 0; i--) {
        scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
queue.pop());
      }
    } else {
      Entry entry = (FieldValueHitQueue.Entry) queue.pop();
      for (int i = queue.size() - 1; i >= 0; i--) {
        scoreDocs[i] = new FieldDoc(entry.docID,
                                    entry.score);
      }
    }

    return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
maxScore);


Notice that if fillFields is true, then documents are popped from the queue.
However if it's false, then the first document is popped out of the queue
and used to populate the entire ScoreDoc[]? I believe that's wrong, right?
Otherwise, the returned TopFieldDocs.scoreDocs array will include the same
document and score?

I noticed there's no test case for that ... TopFieldCollector's constructor
is called only from IndexSearcher.search(Weight, Filter, int, Sort, boolean
/* fillFields */) which is called from IndexSearcher.search(Weight, Filter,
int, sort) with fillFields always set to true. So perhaps that's why this
hasn't showed up?

BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
looks like this:

    FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
    ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
    for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
      scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());

    return new TopFieldDocs(totalHits, scoreDocs,
                            fshq.getFields(), fshq.getMaxScore());

It assumes fillFields is always true and always pops elements out of the
queue.

If this is a bug, I can fix it as part of 1575, as I'm touching that class
anyway. I can also add a test case ... The fix is very simple BTW, just move
the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside the
for loop.

Shai

Re: Bug in TopFieldCollector?

Posted by Shai Erera <se...@gmail.com>.
I checked where it is used, and this arg is required by FieldValueHitQueue,
by its only constructor. The array is passed to each field's getComparator
method, which uses it only for CUSTOM field indeed. There, it calls
comparatorSource.newComparator, and there's only one implementation now of
it, which ignores it entirely.
So it looks this can be easily removed.

I also added a line before construction of FieldValueHitQueue which
nullifies the array, and the tests in TestSort pass. So this convinces me
even more we can remove it.

I remvoed it and nothing breaks (compilation-wise), so I'll proceed with it
as part of 1575.

On Mon, Mar 30, 2009 at 11:49 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> You are right, I forget the sorting. And I also think, the most important
> thing would be to remove the need for the ctor in the custom sort.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > Sent: Monday, March 30, 2009 10:44 AM
> > To: java-dev@lucene.apache.org
> > Subject: Re: Bug in TopFieldCollector?
> >
> > Well, IndexSearcher also sorts its readers biggest to smallest (by
> > .numDocs()) for better performance (so that the queues fill up as much
> > as possible before hitting reader transitions).
> >
> > I think it's the exception, not the rule, for when a custom comparator
> > would require the full array of sub-readers up front (vs, "on the fly"
> > which it already gets with setNextReader), so I think we should not
> > pass it in during construction.
> >
> > This really was a rote carryover from the old API which passed in the
> > top IndexReader when calling newComparator.
> >
> > Mike
> >
> > On Mon, Mar 30, 2009 at 4:39 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > > Why not call IndexSearcher.getIndexReader().getSequentialSubReaders()
> > (see
> > > http://hudson.zones.apache.org/hudson/job/Lucene-
> > trunk/javadoc/all/org/apache/lucene/index/IndexReader.html).
> > > Its public and documented as this:
> > >
> > >
> > >
> > > public IndexReader[] getSequentialSubReaders()
> > >
> > >
> > >
> > > Expert: returns the sequential sub readers that this reader is
> logically
> > > composed of. For example, IndexSearcher uses this API to drive
> searching
> > by
> > > one sub reader at a time. If this reader is not composed of sequential
> > child
> > > readers, it should return null. If this method returns an empty array,
> > that
> > > means this reader is a null reader (for example a MultiReader that has
> > no
> > > sub readers).
> > >
> > > NOTE: for a MultiSegmentReader, which is obtained by
> > open(java.lang.String)
> > > when the index has more than one segment, you should not use the sub-
> > readers
> > > returned by this method to make any changes (setNorm, deleteDocument,
> > etc.).
> > > Doing so will likely lead to index corruption. Use the parent reader
> > > instead.
> > >
> > >
> > >
> > > You only have the problem to replicate the code that gathers the
> > subreaders
> > > of the subreaders itself recursively.
> > >
> > >
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > > ________________________________
> > >
> > > From: Shai Erera [mailto:serera@gmail.com]
> > > Sent: Monday, March 30, 2009 10:20 AM
> > > To: java-dev@lucene.apache.org
> > > Subject: Re: Bug in TopFieldCollector?
> > >
> > >
> > >
> > > Already did !
> > >
> > > Another question - I think we somehow broke TopFieldCollector ...
> > > Previously, in TopFieldDocCollector, it accepted an IndexReader as a
> > > parameter, and now it requires IndexReader[], which is called
> > subReaders.
> > > Calling the 'fast' search methods with Sort has no problem obtaining
> > that
> > > IndexReader[] (and sort it), but how would someone use
> TopFieldCollector
> > w/o
> > > calling the appropriate Searcher methods?
> > >
> > > For example, since all the Searcher methods pass in fillFields = true,
> I
> > > wanted to use the method Searcher.search(Query, TopFieldCollector) in
> > the
> > > test case I wrote, which BTW looks like this:
> > >
> > >   public void testSortWithoutFillFields() throws Exception {
> > >
> > >     // There was previously a bug in TopFieldCollector when fillFields
> > was
> > > set
> > >     // to false - the same doc and score was set in ScoreDoc[] array.
> > This
> > > test
> > >     // asserts that if fillFields is false, the documents are set
> > properly.
> > > It
> > >     // does not use Searcher's default search methods (with Sort) since
> > all
> > > set
> > >     // fillFields to true.
> > >     Sort sort = new Sort();
> > >     int nDocs=10;
> > >
> > >     TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
> > >         new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
> > > false);
> > >
> > >     full.search(new MatchAllDocsQuery(), tdc);
> > >
> > >     ScoreDoc[] sd = tdc.topDocs().scoreDocs;
> > >     for (int i = 1; i < sd.length; i++) {
> > >       assertTrue(sd[i].doc != sd[i - 1].doc);
> > >     }
> > >   }
> > >
> > > You'll notice that creating a TopFieldCollector now is much more
> > complicated
> > > and *ugly*. As a user of IndexSearcher, I can only call
> getIndexReader()
> > > which returns a single IndexReader. I don't have access to
> > gatherSubReaders
> > > and sortSubReaders. I don't see why I should have access to them. So it
> > > forces me to create a dummy array with a single IndexReader.
> > >
> > > There are two ways I see to solve it:
> > > 1. Introduce a getIndexReaders() method on IndexSearcher, which will
> > return
> > > an array of (sorted?) IndexReader.
> > > 2. Introduce a new constructor in TopFieldCollector which accepts a
> > single
> > > IndexReader and make the other one package-private (for use by
> > IndexSearcher
> > > only). That constructor can internally create a dummy array of readers,
> > but
> > > at least it's private to the constructor and not exposed to the rest of
> > the
> > > world.
> > >
> > > Otherwise, I think it ruins TopFieldCollector and will make it a lot
> > less
> > > intuitive to use. At least, people who'd want to move from
> > > TopFieldDocCollector to TopFieldCollector, will find it very
> > inconvenient
> > > and strange.
> > >
> > > What do you think? I can do that (2) as part of 1575. If (1) is better,
> > then
> > > I think a different issue should be opened, because we might want to
> > return
> > > such an array as sorted or something, which makes it less trivial.
> > >
> > > Shai
> > >
> > > On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless
> > > <lu...@mikemccandless.com> wrote:
> > >
> > > Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
> > > I would say add test case & fix it under 1575.
> > >
> > > Mike
> > >
> > > On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
> > >> Hi
> > >>
> > >> As I prepared the patch for 1575, I noticed a strange implementation
> in
> > >> TopFieldCollector's topDocs():
> > >>
> > >>     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
> > >>     if (fillFields) {
> > >>       for (int i = queue.size() - 1; i >= 0; i--) {
> > >>         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
> > >> queue.pop());
> > >>       }
> > >>     } else {
> > >>       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
> > >>       for (int i = queue.size() - 1; i >= 0; i--) {
> > >>         scoreDocs[i] = new FieldDoc(entry.docID,
> > >>                                     entry.score);
> > >>       }
> > >>     }
> > >>
> > >>     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
> > >> maxScore);
> > >>
> > >>
> > >> Notice that if fillFields is true, then documents are popped from the
> > >> queue.
> > >> However if it's false, then the first document is popped out of the
> > queue
> > >> and used to populate the entire ScoreDoc[]? I believe that's wrong,
> > right?
> > >> Otherwise, the returned TopFieldDocs.scoreDocs array will include the
> > same
> > >> document and score?
> > >>
> > >> I noticed there's no test case for that ... TopFieldCollector's
> > >> constructor
> > >> is called only from IndexSearcher.search(Weight, Filter, int, Sort,
> > >> boolean
> > >> /* fillFields */) which is called from IndexSearcher.search(Weight,
> > >> Filter,
> > >> int, sort) with fillFields always set to true. So perhaps that's why
> > this
> > >> hasn't showed up?
> > >>
> > >> BTW, the now deprecated TopFieldDocCollector's topDocs()
> implementation
> > >> looks like this:
> > >>
> > >>     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
> > >>     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
> > >>     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
> > >>       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
> > >>
> > >>     return new TopFieldDocs(totalHits, scoreDocs,
> > >>                             fshq.getFields(), fshq.getMaxScore());
> > >>
> > >> It assumes fillFields is always true and always pops elements out of
> > the
> > >> queue.
> > >>
> > >> If this is a bug, I can fix it as part of 1575, as I'm touching that
> > class
> > >> anyway. I can also add a test case ... The fix is very simple BTW,
> just
> > >> move
> > >> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();"
> inside
> > >> the
> > >> for loop.
> > >>
> > >> Shai
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

RE: Bug in TopFieldCollector?

Posted by Uwe Schindler <uw...@thetaphi.de>.
You are right, I forget the sorting. And I also think, the most important
thing would be to remove the need for the ctor in the custom sort.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, March 30, 2009 10:44 AM
> To: java-dev@lucene.apache.org
> Subject: Re: Bug in TopFieldCollector?
> 
> Well, IndexSearcher also sorts its readers biggest to smallest (by
> .numDocs()) for better performance (so that the queues fill up as much
> as possible before hitting reader transitions).
> 
> I think it's the exception, not the rule, for when a custom comparator
> would require the full array of sub-readers up front (vs, "on the fly"
> which it already gets with setNextReader), so I think we should not
> pass it in during construction.
> 
> This really was a rote carryover from the old API which passed in the
> top IndexReader when calling newComparator.
> 
> Mike
> 
> On Mon, Mar 30, 2009 at 4:39 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > Why not call IndexSearcher.getIndexReader().getSequentialSubReaders()
> (see
> > http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/javadoc/all/org/apache/lucene/index/IndexReader.html).
> > Its public and documented as this:
> >
> >
> >
> > public IndexReader[] getSequentialSubReaders()
> >
> >
> >
> > Expert: returns the sequential sub readers that this reader is logically
> > composed of. For example, IndexSearcher uses this API to drive searching
> by
> > one sub reader at a time. If this reader is not composed of sequential
> child
> > readers, it should return null. If this method returns an empty array,
> that
> > means this reader is a null reader (for example a MultiReader that has
> no
> > sub readers).
> >
> > NOTE: for a MultiSegmentReader, which is obtained by
> open(java.lang.String)
> > when the index has more than one segment, you should not use the sub-
> readers
> > returned by this method to make any changes (setNorm, deleteDocument,
> etc.).
> > Doing so will likely lead to index corruption. Use the parent reader
> > instead.
> >
> >
> >
> > You only have the problem to replicate the code that gathers the
> subreaders
> > of the subreaders itself recursively.
> >
> >
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > ________________________________
> >
> > From: Shai Erera [mailto:serera@gmail.com]
> > Sent: Monday, March 30, 2009 10:20 AM
> > To: java-dev@lucene.apache.org
> > Subject: Re: Bug in TopFieldCollector?
> >
> >
> >
> > Already did !
> >
> > Another question - I think we somehow broke TopFieldCollector ...
> > Previously, in TopFieldDocCollector, it accepted an IndexReader as a
> > parameter, and now it requires IndexReader[], which is called
> subReaders.
> > Calling the 'fast' search methods with Sort has no problem obtaining
> that
> > IndexReader[] (and sort it), but how would someone use TopFieldCollector
> w/o
> > calling the appropriate Searcher methods?
> >
> > For example, since all the Searcher methods pass in fillFields = true, I
> > wanted to use the method Searcher.search(Query, TopFieldCollector) in
> the
> > test case I wrote, which BTW looks like this:
> >
> >   public void testSortWithoutFillFields() throws Exception {
> >
> >     // There was previously a bug in TopFieldCollector when fillFields
> was
> > set
> >     // to false - the same doc and score was set in ScoreDoc[] array.
> This
> > test
> >     // asserts that if fillFields is false, the documents are set
> properly.
> > It
> >     // does not use Searcher's default search methods (with Sort) since
> all
> > set
> >     // fillFields to true.
> >     Sort sort = new Sort();
> >     int nDocs=10;
> >
> >     TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
> >         new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
> > false);
> >
> >     full.search(new MatchAllDocsQuery(), tdc);
> >
> >     ScoreDoc[] sd = tdc.topDocs().scoreDocs;
> >     for (int i = 1; i < sd.length; i++) {
> >       assertTrue(sd[i].doc != sd[i - 1].doc);
> >     }
> >   }
> >
> > You'll notice that creating a TopFieldCollector now is much more
> complicated
> > and *ugly*. As a user of IndexSearcher, I can only call getIndexReader()
> > which returns a single IndexReader. I don't have access to
> gatherSubReaders
> > and sortSubReaders. I don't see why I should have access to them. So it
> > forces me to create a dummy array with a single IndexReader.
> >
> > There are two ways I see to solve it:
> > 1. Introduce a getIndexReaders() method on IndexSearcher, which will
> return
> > an array of (sorted?) IndexReader.
> > 2. Introduce a new constructor in TopFieldCollector which accepts a
> single
> > IndexReader and make the other one package-private (for use by
> IndexSearcher
> > only). That constructor can internally create a dummy array of readers,
> but
> > at least it's private to the constructor and not exposed to the rest of
> the
> > world.
> >
> > Otherwise, I think it ruins TopFieldCollector and will make it a lot
> less
> > intuitive to use. At least, people who'd want to move from
> > TopFieldDocCollector to TopFieldCollector, will find it very
> inconvenient
> > and strange.
> >
> > What do you think? I can do that (2) as part of 1575. If (1) is better,
> then
> > I think a different issue should be opened, because we might want to
> return
> > such an array as sorted or something, which makes it less trivial.
> >
> > Shai
> >
> > On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >
> > Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
> > I would say add test case & fix it under 1575.
> >
> > Mike
> >
> > On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
> >> Hi
> >>
> >> As I prepared the patch for 1575, I noticed a strange implementation in
> >> TopFieldCollector's topDocs():
> >>
> >>     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
> >>     if (fillFields) {
> >>       for (int i = queue.size() - 1; i >= 0; i--) {
> >>         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
> >> queue.pop());
> >>       }
> >>     } else {
> >>       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
> >>       for (int i = queue.size() - 1; i >= 0; i--) {
> >>         scoreDocs[i] = new FieldDoc(entry.docID,
> >>                                     entry.score);
> >>       }
> >>     }
> >>
> >>     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
> >> maxScore);
> >>
> >>
> >> Notice that if fillFields is true, then documents are popped from the
> >> queue.
> >> However if it's false, then the first document is popped out of the
> queue
> >> and used to populate the entire ScoreDoc[]? I believe that's wrong,
> right?
> >> Otherwise, the returned TopFieldDocs.scoreDocs array will include the
> same
> >> document and score?
> >>
> >> I noticed there's no test case for that ... TopFieldCollector's
> >> constructor
> >> is called only from IndexSearcher.search(Weight, Filter, int, Sort,
> >> boolean
> >> /* fillFields */) which is called from IndexSearcher.search(Weight,
> >> Filter,
> >> int, sort) with fillFields always set to true. So perhaps that's why
> this
> >> hasn't showed up?
> >>
> >> BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
> >> looks like this:
> >>
> >>     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
> >>     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
> >>     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
> >>       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
> >>
> >>     return new TopFieldDocs(totalHits, scoreDocs,
> >>                             fshq.getFields(), fshq.getMaxScore());
> >>
> >> It assumes fillFields is always true and always pops elements out of
> the
> >> queue.
> >>
> >> If this is a bug, I can fix it as part of 1575, as I'm touching that
> class
> >> anyway. I can also add a test case ... The fix is very simple BTW, just
> >> move
> >> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside
> >> the
> >> for loop.
> >>
> >> Shai
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bug in TopFieldCollector?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well, IndexSearcher also sorts its readers biggest to smallest (by
.numDocs()) for better performance (so that the queues fill up as much
as possible before hitting reader transitions).

I think it's the exception, not the rule, for when a custom comparator
would require the full array of sub-readers up front (vs, "on the fly"
which it already gets with setNextReader), so I think we should not
pass it in during construction.

This really was a rote carryover from the old API which passed in the
top IndexReader when calling newComparator.

Mike

On Mon, Mar 30, 2009 at 4:39 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Why not call IndexSearcher.getIndexReader().getSequentialSubReaders() (see
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/IndexReader.html).
> Its public and documented as this:
>
>
>
> public IndexReader[] getSequentialSubReaders()
>
>
>
> Expert: returns the sequential sub readers that this reader is logically
> composed of. For example, IndexSearcher uses this API to drive searching by
> one sub reader at a time. If this reader is not composed of sequential child
> readers, it should return null. If this method returns an empty array, that
> means this reader is a null reader (for example a MultiReader that has no
> sub readers).
>
> NOTE: for a MultiSegmentReader, which is obtained by open(java.lang.String)
> when the index has more than one segment, you should not use the sub-readers
> returned by this method to make any changes (setNorm, deleteDocument, etc.).
> Doing so will likely lead to index corruption. Use the parent reader
> instead.
>
>
>
> You only have the problem to replicate the code that gathers the subreaders
> of the subreaders itself recursively.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> ________________________________
>
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Monday, March 30, 2009 10:20 AM
> To: java-dev@lucene.apache.org
> Subject: Re: Bug in TopFieldCollector?
>
>
>
> Already did !
>
> Another question - I think we somehow broke TopFieldCollector ...
> Previously, in TopFieldDocCollector, it accepted an IndexReader as a
> parameter, and now it requires IndexReader[], which is called subReaders.
> Calling the 'fast' search methods with Sort has no problem obtaining that
> IndexReader[] (and sort it), but how would someone use TopFieldCollector w/o
> calling the appropriate Searcher methods?
>
> For example, since all the Searcher methods pass in fillFields = true, I
> wanted to use the method Searcher.search(Query, TopFieldCollector) in the
> test case I wrote, which BTW looks like this:
>
>   public void testSortWithoutFillFields() throws Exception {
>
>     // There was previously a bug in TopFieldCollector when fillFields was
> set
>     // to false - the same doc and score was set in ScoreDoc[] array. This
> test
>     // asserts that if fillFields is false, the documents are set properly.
> It
>     // does not use Searcher's default search methods (with Sort) since all
> set
>     // fillFields to true.
>     Sort sort = new Sort();
>     int nDocs=10;
>
>     TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
>         new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
> false);
>
>     full.search(new MatchAllDocsQuery(), tdc);
>
>     ScoreDoc[] sd = tdc.topDocs().scoreDocs;
>     for (int i = 1; i < sd.length; i++) {
>       assertTrue(sd[i].doc != sd[i - 1].doc);
>     }
>   }
>
> You'll notice that creating a TopFieldCollector now is much more complicated
> and *ugly*. As a user of IndexSearcher, I can only call getIndexReader()
> which returns a single IndexReader. I don't have access to gatherSubReaders
> and sortSubReaders. I don't see why I should have access to them. So it
> forces me to create a dummy array with a single IndexReader.
>
> There are two ways I see to solve it:
> 1. Introduce a getIndexReaders() method on IndexSearcher, which will return
> an array of (sorted?) IndexReader.
> 2. Introduce a new constructor in TopFieldCollector which accepts a single
> IndexReader and make the other one package-private (for use by IndexSearcher
> only). That constructor can internally create a dummy array of readers, but
> at least it's private to the constructor and not exposed to the rest of the
> world.
>
> Otherwise, I think it ruins TopFieldCollector and will make it a lot less
> intuitive to use. At least, people who'd want to move from
> TopFieldDocCollector to TopFieldCollector, will find it very inconvenient
> and strange.
>
> What do you think? I can do that (2) as part of 1575. If (1) is better, then
> I think a different issue should be opened, because we might want to return
> such an array as sorted or something, which makes it less trivial.
>
> Shai
>
> On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>
> Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
> I would say add test case & fix it under 1575.
>
> Mike
>
> On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
>> Hi
>>
>> As I prepared the patch for 1575, I noticed a strange implementation in
>> TopFieldCollector's topDocs():
>>
>>     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
>>     if (fillFields) {
>>       for (int i = queue.size() - 1; i >= 0; i--) {
>>         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
>> queue.pop());
>>       }
>>     } else {
>>       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
>>       for (int i = queue.size() - 1; i >= 0; i--) {
>>         scoreDocs[i] = new FieldDoc(entry.docID,
>>                                     entry.score);
>>       }
>>     }
>>
>>     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
>> maxScore);
>>
>>
>> Notice that if fillFields is true, then documents are popped from the
>> queue.
>> However if it's false, then the first document is popped out of the queue
>> and used to populate the entire ScoreDoc[]? I believe that's wrong, right?
>> Otherwise, the returned TopFieldDocs.scoreDocs array will include the same
>> document and score?
>>
>> I noticed there's no test case for that ... TopFieldCollector's
>> constructor
>> is called only from IndexSearcher.search(Weight, Filter, int, Sort,
>> boolean
>> /* fillFields */) which is called from IndexSearcher.search(Weight,
>> Filter,
>> int, sort) with fillFields always set to true. So perhaps that's why this
>> hasn't showed up?
>>
>> BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
>> looks like this:
>>
>>     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
>>     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
>>     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
>>       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
>>
>>     return new TopFieldDocs(totalHits, scoreDocs,
>>                             fshq.getFields(), fshq.getMaxScore());
>>
>> It assumes fillFields is always true and always pops elements out of the
>> queue.
>>
>> If this is a bug, I can fix it as part of 1575, as I'm touching that class
>> anyway. I can also add a test case ... The fix is very simple BTW, just
>> move
>> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside
>> the
>> for loop.
>>
>> Shai
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Bug in TopFieldCollector?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Why not call IndexSearcher.getIndexReader().getSequentialSubReaders() (see
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apach
e/lucene/index/IndexReader.html). Its public and documented as this:

 

public
<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apac
he/lucene/index/IndexReader.html> IndexReader[] getSequentialSubReaders()

 

Expert: returns the sequential sub readers that this reader is logically
composed of. For example, IndexSearcher uses this API to drive searching by
one sub reader at a time. If this reader is not composed of sequential child
readers, it should return null. If this method returns an empty array, that
means this reader is a null reader (for example a MultiReader that has no
sub readers). 

NOTE: for a MultiSegmentReader, which is obtained by
<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apac
he/lucene/index/IndexReader.html#open(java.lang.String)>
open(java.lang.String) when the index has more than one segment, you should
not use the sub-readers returned by this method to make any changes
(setNorm, deleteDocument, etc.). Doing so will likely lead to index
corruption. Use the parent reader instead. 

 

You only have the problem to replicate the code that gathers the subreaders
of the subreaders itself recursively.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Shai Erera [mailto:serera@gmail.com] 
Sent: Monday, March 30, 2009 10:20 AM
To: java-dev@lucene.apache.org
Subject: Re: Bug in TopFieldCollector?

 

Already did !

Another question - I think we somehow broke TopFieldCollector ...
Previously, in TopFieldDocCollector, it accepted an IndexReader as a
parameter, and now it requires IndexReader[], which is called subReaders.
Calling the 'fast' search methods with Sort has no problem obtaining that
IndexReader[] (and sort it), but how would someone use TopFieldCollector w/o
calling the appropriate Searcher methods?

For example, since all the Searcher methods pass in fillFields = true, I
wanted to use the method Searcher.search(Query, TopFieldCollector) in the
test case I wrote, which BTW looks like this:

  public void testSortWithoutFillFields() throws Exception {
    
    // There was previously a bug in TopFieldCollector when fillFields was
set
    // to false - the same doc and score was set in ScoreDoc[] array. This
test
    // asserts that if fillFields is false, the documents are set properly.
It
    // does not use Searcher's default search methods (with Sort) since all
set
    // fillFields to true.
    Sort sort = new Sort();
    int nDocs=10;
    
    TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
        new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
false);
    
    full.search(new MatchAllDocsQuery(), tdc);

    ScoreDoc[] sd = tdc.topDocs().scoreDocs;
    for (int i = 1; i < sd.length; i++) {
      assertTrue(sd[i].doc != sd[i - 1].doc);
    }
  }

You'll notice that creating a TopFieldCollector now is much more complicated
and *ugly*. As a user of IndexSearcher, I can only call getIndexReader()
which returns a single IndexReader. I don't have access to gatherSubReaders
and sortSubReaders. I don't see why I should have access to them. So it
forces me to create a dummy array with a single IndexReader.

There are two ways I see to solve it:
1. Introduce a getIndexReaders() method on IndexSearcher, which will return
an array of (sorted?) IndexReader.
2. Introduce a new constructor in TopFieldCollector which accepts a single
IndexReader and make the other one package-private (for use by IndexSearcher
only). That constructor can internally create a dummy array of readers, but
at least it's private to the constructor and not exposed to the rest of the
world.

Otherwise, I think it ruins TopFieldCollector and will make it a lot less
intuitive to use. At least, people who'd want to move from
TopFieldDocCollector to TopFieldCollector, will find it very inconvenient
and strange.

What do you think? I can do that (2) as part of 1575. If (1) is better, then
I think a different issue should be opened, because we might want to return
such an array as sorted or something, which makes it less trivial.

Shai

On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:

Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
I would say add test case & fix it under 1575.

Mike


On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
> Hi
>
> As I prepared the patch for 1575, I noticed a strange implementation in
> TopFieldCollector's topDocs():
>
>     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
>     if (fillFields) {
>       for (int i = queue.size() - 1; i >= 0; i--) {
>         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
> queue.pop());
>       }
>     } else {
>       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
>       for (int i = queue.size() - 1; i >= 0; i--) {
>         scoreDocs[i] = new FieldDoc(entry.docID,
>                                     entry.score);
>       }
>     }
>
>     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
> maxScore);
>
>
> Notice that if fillFields is true, then documents are popped from the
queue.
> However if it's false, then the first document is popped out of the queue
> and used to populate the entire ScoreDoc[]? I believe that's wrong, right?
> Otherwise, the returned TopFieldDocs.scoreDocs array will include the same
> document and score?
>
> I noticed there's no test case for that ... TopFieldCollector's
constructor
> is called only from IndexSearcher.search(Weight, Filter, int, Sort,
boolean
> /* fillFields */) which is called from IndexSearcher.search(Weight,
Filter,
> int, sort) with fillFields always set to true. So perhaps that's why this
> hasn't showed up?
>
> BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
> looks like this:
>
>     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
>     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
>     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
>       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
>
>     return new TopFieldDocs(totalHits, scoreDocs,
>                             fshq.getFields(), fshq.getMaxScore());
>
> It assumes fillFields is always true and always pops elements out of the
> queue.
>
> If this is a bug, I can fix it as part of 1575, as I'm touching that class
> anyway. I can also add a test case ... The fix is very simple BTW, just
move
> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside
the
> for loop.
>
> Shai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

 


Re: Bug in TopFieldCollector?

Posted by Michael McCandless <lu...@mikemccandless.com>.
I agree, this is not a pleasant migration path forward from 2.4.

I think maybe a good fix is to not even require IndexReader[]
subReaders to be passed in, in the first place.  Tracing downwards,
the only reason why we needs this array at construction time is for
the SortField.CUSTOM case, when it calls
FieldComparatorSource.newComparator().  This is the new API for a
custom sort.  The old API took the toplevel IndexReader, so when we
carried over to the new API we replaced that with the array of
subreaders.

But, because FieldComparator is a more advanced API that steps through
each sub-reader as it goes, I don't think we need to provide the array
of subreaders up front.

So I think we should simply remove the IndexReader[] as an arg,
entirely?  It's a new (in 2.9) API so we are free to change it.

Mike

On Mon, Mar 30, 2009 at 4:20 AM, Shai Erera <se...@gmail.com> wrote:
> Already did !
>
> Another question - I think we somehow broke TopFieldCollector ...
> Previously, in TopFieldDocCollector, it accepted an IndexReader as a
> parameter, and now it requires IndexReader[], which is called subReaders.
> Calling the 'fast' search methods with Sort has no problem obtaining that
> IndexReader[] (and sort it), but how would someone use TopFieldCollector w/o
> calling the appropriate Searcher methods?
>
> For example, since all the Searcher methods pass in fillFields = true, I
> wanted to use the method Searcher.search(Query, TopFieldCollector) in the
> test case I wrote, which BTW looks like this:
>
>   public void testSortWithoutFillFields() throws Exception {
>
>     // There was previously a bug in TopFieldCollector when fillFields was
> set
>     // to false - the same doc and score was set in ScoreDoc[] array. This
> test
>     // asserts that if fillFields is false, the documents are set properly.
> It
>     // does not use Searcher's default search methods (with Sort) since all
> set
>     // fillFields to true.
>     Sort sort = new Sort();
>     int nDocs=10;
>
>     TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
>         new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
> false);
>
>     full.search(new MatchAllDocsQuery(), tdc);
>
>     ScoreDoc[] sd = tdc.topDocs().scoreDocs;
>     for (int i = 1; i < sd.length; i++) {
>       assertTrue(sd[i].doc != sd[i - 1].doc);
>     }
>   }
>
> You'll notice that creating a TopFieldCollector now is much more complicated
> and *ugly*. As a user of IndexSearcher, I can only call getIndexReader()
> which returns a single IndexReader. I don't have access to gatherSubReaders
> and sortSubReaders. I don't see why I should have access to them. So it
> forces me to create a dummy array with a single IndexReader.
>
> There are two ways I see to solve it:
> 1. Introduce a getIndexReaders() method on IndexSearcher, which will return
> an array of (sorted?) IndexReader.
> 2. Introduce a new constructor in TopFieldCollector which accepts a single
> IndexReader and make the other one package-private (for use by IndexSearcher
> only). That constructor can internally create a dummy array of readers, but
> at least it's private to the constructor and not exposed to the rest of the
> world.
>
> Otherwise, I think it ruins TopFieldCollector and will make it a lot less
> intuitive to use. At least, people who'd want to move from
> TopFieldDocCollector to TopFieldCollector, will find it very inconvenient
> and strange.
>
> What do you think? I can do that (2) as part of 1575. If (1) is better, then
> I think a different issue should be opened, because we might want to return
> such an array as sorted or something, which makes it less trivial.
>
> Shai
>
> On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
>> I would say add test case & fix it under 1575.
>>
>> Mike
>>
>> On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
>> > Hi
>> >
>> > As I prepared the patch for 1575, I noticed a strange implementation in
>> > TopFieldCollector's topDocs():
>> >
>> >     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
>> >     if (fillFields) {
>> >       for (int i = queue.size() - 1; i >= 0; i--) {
>> >         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
>> > queue.pop());
>> >       }
>> >     } else {
>> >       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
>> >       for (int i = queue.size() - 1; i >= 0; i--) {
>> >         scoreDocs[i] = new FieldDoc(entry.docID,
>> >                                     entry.score);
>> >       }
>> >     }
>> >
>> >     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
>> > maxScore);
>> >
>> >
>> > Notice that if fillFields is true, then documents are popped from the
>> > queue.
>> > However if it's false, then the first document is popped out of the
>> > queue
>> > and used to populate the entire ScoreDoc[]? I believe that's wrong,
>> > right?
>> > Otherwise, the returned TopFieldDocs.scoreDocs array will include the
>> > same
>> > document and score?
>> >
>> > I noticed there's no test case for that ... TopFieldCollector's
>> > constructor
>> > is called only from IndexSearcher.search(Weight, Filter, int, Sort,
>> > boolean
>> > /* fillFields */) which is called from IndexSearcher.search(Weight,
>> > Filter,
>> > int, sort) with fillFields always set to true. So perhaps that's why
>> > this
>> > hasn't showed up?
>> >
>> > BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
>> > looks like this:
>> >
>> >     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
>> >     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
>> >     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
>> >       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
>> >
>> >     return new TopFieldDocs(totalHits, scoreDocs,
>> >                             fshq.getFields(), fshq.getMaxScore());
>> >
>> > It assumes fillFields is always true and always pops elements out of the
>> > queue.
>> >
>> > If this is a bug, I can fix it as part of 1575, as I'm touching that
>> > class
>> > anyway. I can also add a test case ... The fix is very simple BTW, just
>> > move
>> > the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside
>> > the
>> > for loop.
>> >
>> > Shai
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bug in TopFieldCollector?

Posted by Shai Erera <se...@gmail.com>.
Already did !

Another question - I think we somehow broke TopFieldCollector ...
Previously, in TopFieldDocCollector, it accepted an IndexReader as a
parameter, and now it requires IndexReader[], which is called subReaders.
Calling the 'fast' search methods with Sort has no problem obtaining that
IndexReader[] (and sort it), but how would someone use TopFieldCollector w/o
calling the appropriate Searcher methods?

For example, since all the Searcher methods pass in fillFields = true, I
wanted to use the method Searcher.search(Query, TopFieldCollector) in the
test case I wrote, which BTW looks like this:

  public void testSortWithoutFillFields() throws Exception {

    // There was previously a bug in TopFieldCollector when fillFields was
set
    // to false - the same doc and score was set in ScoreDoc[] array. This
test
    // asserts that if fillFields is false, the documents are set properly.
It
    // does not use Searcher's default search methods (with Sort) since all
set
    // fillFields to true.
    Sort sort = new Sort();
    int nDocs=10;

    TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
        new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
false);

    full.search(new MatchAllDocsQuery(), tdc);

    ScoreDoc[] sd = tdc.topDocs().scoreDocs;
    for (int i = 1; i < sd.length; i++) {
      assertTrue(sd[i].doc != sd[i - 1].doc);
    }
  }

You'll notice that creating a TopFieldCollector now is much more complicated
and *ugly*. As a user of IndexSearcher, I can only call getIndexReader()
which returns a single IndexReader. I don't have access to gatherSubReaders
and sortSubReaders. I don't see why I should have access to them. So it
forces me to create a dummy array with a single IndexReader.

There are two ways I see to solve it:
1. Introduce a getIndexReaders() method on IndexSearcher, which will return
an array of (sorted?) IndexReader.
2. Introduce a new constructor in TopFieldCollector which accepts a single
IndexReader and make the other one package-private (for use by IndexSearcher
only). That constructor can internally create a dummy array of readers, but
at least it's private to the constructor and not exposed to the rest of the
world.

Otherwise, I think it ruins TopFieldCollector and will make it a lot less
intuitive to use. At least, people who'd want to move from
TopFieldDocCollector to TopFieldCollector, will find it very inconvenient
and strange.

What do you think? I can do that (2) as part of 1575. If (1) is better, then
I think a different issue should be opened, because we might want to return
such an array as sorted or something, which makes it less trivial.

Shai

On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
> I would say add test case & fix it under 1575.
>
> Mike
>
> On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
> > Hi
> >
> > As I prepared the patch for 1575, I noticed a strange implementation in
> > TopFieldCollector's topDocs():
> >
> >     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
> >     if (fillFields) {
> >       for (int i = queue.size() - 1; i >= 0; i--) {
> >         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
> > queue.pop());
> >       }
> >     } else {
> >       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
> >       for (int i = queue.size() - 1; i >= 0; i--) {
> >         scoreDocs[i] = new FieldDoc(entry.docID,
> >                                     entry.score);
> >       }
> >     }
> >
> >     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
> > maxScore);
> >
> >
> > Notice that if fillFields is true, then documents are popped from the
> queue.
> > However if it's false, then the first document is popped out of the queue
> > and used to populate the entire ScoreDoc[]? I believe that's wrong,
> right?
> > Otherwise, the returned TopFieldDocs.scoreDocs array will include the
> same
> > document and score?
> >
> > I noticed there's no test case for that ... TopFieldCollector's
> constructor
> > is called only from IndexSearcher.search(Weight, Filter, int, Sort,
> boolean
> > /* fillFields */) which is called from IndexSearcher.search(Weight,
> Filter,
> > int, sort) with fillFields always set to true. So perhaps that's why this
> > hasn't showed up?
> >
> > BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
> > looks like this:
> >
> >     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
> >     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
> >     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
> >       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
> >
> >     return new TopFieldDocs(totalHits, scoreDocs,
> >                             fshq.getFields(), fshq.getMaxScore());
> >
> > It assumes fillFields is always true and always pops elements out of the
> > queue.
> >
> > If this is a bug, I can fix it as part of 1575, as I'm touching that
> class
> > anyway. I can also add a test case ... The fix is very simple BTW, just
> move
> > the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside
> the
> > for loop.
> >
> > Shai
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Bug in TopFieldCollector?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
I would say add test case & fix it under 1575.

Mike

On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <se...@gmail.com> wrote:
> Hi
>
> As I prepared the patch for 1575, I noticed a strange implementation in
> TopFieldCollector's topDocs():
>
>     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
>     if (fillFields) {
>       for (int i = queue.size() - 1; i >= 0; i--) {
>         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
> queue.pop());
>       }
>     } else {
>       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
>       for (int i = queue.size() - 1; i >= 0; i--) {
>         scoreDocs[i] = new FieldDoc(entry.docID,
>                                     entry.score);
>       }
>     }
>
>     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
> maxScore);
>
>
> Notice that if fillFields is true, then documents are popped from the queue.
> However if it's false, then the first document is popped out of the queue
> and used to populate the entire ScoreDoc[]? I believe that's wrong, right?
> Otherwise, the returned TopFieldDocs.scoreDocs array will include the same
> document and score?
>
> I noticed there's no test case for that ... TopFieldCollector's constructor
> is called only from IndexSearcher.search(Weight, Filter, int, Sort, boolean
> /* fillFields */) which is called from IndexSearcher.search(Weight, Filter,
> int, sort) with fillFields always set to true. So perhaps that's why this
> hasn't showed up?
>
> BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
> looks like this:
>
>     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
>     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
>     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in array
>       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
>
>     return new TopFieldDocs(totalHits, scoreDocs,
>                             fshq.getFields(), fshq.getMaxScore());
>
> It assumes fillFields is always true and always pops elements out of the
> queue.
>
> If this is a bug, I can fix it as part of 1575, as I'm touching that class
> anyway. I can also add a test case ... The fix is very simple BTW, just move
> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside the
> for loop.
>
> Shai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org