You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by GitBox <gi...@apache.org> on 2019/04/15 18:32:35 UTC

[GitHub] [lucene-solr] msokolov commented on a change in pull request #579: LUCENE-8681: prorated early termination

msokolov commented on a change in pull request #579: LUCENE-8681: prorated early termination
URL: https://github.com/apache/lucene-solr/pull/579#discussion_r275490738
 
 

 ##########
 File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java
 ##########
 @@ -165,11 +169,35 @@ public void collect(int doc) throws IOException {
               updateMinCompetitiveScore(scorer);
             }
           }
+          if (canEarlyTerminate) {
+              // When early terminating, stop collecting hits from this leaf once we have its prorated hits.
+              if (leafHits > leafHitsThreshold) {
+                  totalHitsRelation = Relation.GREATER_THAN_OR_EQUAL_TO;
+                  throw new CollectionTerminatedException();
+              }
+          }
         }
 
       };
     }
 
+    /** The total number of documents that matched this query; may be a lower bound in case of early termination. */
+    @Override
+    public int getTotalHits() {
+      return totalHits;
+    }
+
+    private int prorateForSegment(int topK, LeafReaderContext leafCtx) {
+        // prorate number of hits to collect based on proportion of documents in this leaf (segment).
+        // p := probability of a top-k document (or any document) being in this segment
+        double p = (double) leafCtx.reader().numDocs() / leafCtx.parent.reader().numDocs();
 
 Review comment:
   I added divide-by-zero check, and I think we were always using `numDocs()`, so this should be good.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org