You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Chris Russell <Ch...@careerbuilder.com> on 2014/05/01 23:36:08 UTC

Issue with functions that require metadata, and LeafCollectors

Hi.
I have opened an issue on Jira about improving the scale() function: https://issues.apache.org/jira/browse/LUCENE-5637

I was able to improve the performance of the scale function quite a bit, but this required me to refactor some code in IndexSearcher.Search
There is a loop where scorers are created for each AtomicReaderContext, and then used to score documents. It looks like this in 4.8:
    for (AtomicReaderContext ctx : leaves) { // search each subreader
      try {
        collector.setNextReader(ctx);
      [...]
      BulkScorer scorer = weight.bulkScorer(ctx, !collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());
      if (scorer != null) {
        try {
          scorer.score(collector);
        [...]
    }

I was able to break this up into two for-loops, and this was necessary because the scale function needed to see each AtomicReaderContext before being asked to score any documents, in order to determine the scale constant without doing something like grabbing the top level reader and looking at every document in the index (previous behavior)
So, new loops like this in 4.8:

   ArrayList<BulkScorer> scorers = new ArrayList<BulkScorer>();

   for (AtomicReaderContext ctx : leaves) { // search each subreader

     BulkScorer scorer = weight.bulkScorer(ctx, !collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());

     scorers.add(scorer);

   }

   for(int i = 0; i < leaves.size(); i++) {

     BulkScorer scorer = scorers.get(i);

     AtomicReaderContext ctx = leaves.get(i);

     try {

       collector.setNextReader(ctx);

     [...]

     if (scorer != null) {

       try {

         scorer.score(collector);

       [...]

   }

This seems to work fine and allows the function to gather the metadata it needs.

When trying to bring my code to trunk, I ran into an issue with the recently introduced LeafCollector interface.
It seems like setNextReader no longer exists, and scorer.score takes in a LeafCollector now.
In trunk, when I try to break this for-loop into two for-loops, it breaks a ton of unit tests.
I need the LeafCollectors in the first loop where I am making the scorers because LeafCollector now has the acceptDocsOutOfOrder method.
I need them in the second loop because that is what .score takes now.
So I tried keeping track of the LeafCollectors I created in the first loop and using them in the second, which did not work.
I also tried asking the collector for new LeafCollectors in each of the two loops, and that did not work.

I think this is all because setNextReader went away and there is some side effect I am encountering related to making a LeafCollector and not immediately scoring with it?  Does asking the passed-in collector for another LeafCollector for some other context do something to the previous LeafCollector?

All I am trying to do is create all scorers before using them, which seems like it should be possible logically.  This is especially useful for functions that require metadata.
Any assistance would be appreciated.

-Chris

Re: Issue with functions that require metadata, and LeafCollectors

Posted by shikhar <sh...@schmizz.net>.

On Fri, May 2, 2014 at 3:06 AM, Chris Russell <
Chris.Russell@careerbuilder.com> wrote:

> I need the LeafCollectors in the first loop where I am making the scorers
> because LeafCollector now has the acceptDocsOutOfOrder method.
>
>
I wonder if the answer here is that acceptsDocsOutOfOrder() should live on
the Collector rather than the LeafCollector. Are there cases where that
does not make sense?