You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "gsmiller (via GitHub)" <gi...@apache.org> on 2023/11/29 18:14:11 UTC

[PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

gsmiller opened a new pull request, #12853:
URL: https://github.com/apache/lucene/pull/12853

   In GH#12558, @Shradha26 discovered that faceting implementations are sometimes getting empty MatchingDoc lists, resulting in undesired behavior. We should consider it a bug for MatchingDocs to not be populated, even if there are no collected hits. After chasing the issue a bit, it appears that drill-sideways can miss calling `#finish` on the FacetsCollectors it manages if it end up not scoring any docs. This PR addresses it.
   
   I made a couple related changes in this PR to get at fixing the bug:
   1. Added `DrillSideways#createDrillSidewaysFacetsCollectorManager` as a protected method that users can extend to create specific facet collecting implementations. This seems reasonable to me as we already have a similar hook for the drill-down collector. I found this hook useful for testing. If we don't want to expose this API surface, I can adjust the testing approach, but I think it's reasonable to expose personally.
   2. Changed the `DrillSidewaysScorer` to accept `LeafCollectors` instead of `FacetCollectors`. This seems like a better division of responsibilities anyway, and is simpler, but it also lets `DrillSidewaysQuery` manage the creation of the leaf collectors when setting up the `BulkScorer`. This is necessary if `DrillSidewaysQuery` is going to call `#finish` in no-scoring cases since `#finish` relies on a leaf scorer having been created already from the collector.
   3. I also added a TODO to go remove `DrillSideways#createDrillDownFacetsCollector` since it is no longer used and is a trappy hook for users. I'll tackle this in a separate PR though since we need a deprecation back port and it's not really related to this (just something I came across).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on code in PR #12853:
URL: https://github.com/apache/lucene/pull/12853#discussion_r1421103992


##########
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##########
@@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
           FacetsCollector sidewaysCollector = drillSidewaysCollectorManagers[dim].newCollector();
           sidewaysCollectors[dim] = sidewaysCollector;
 
-          dims[dim] = new DrillSidewaysScorer.DocsAndCost(scorer, sidewaysCollector);
+          dims[dim] =
+              new DrillSidewaysScorer.DocsAndCost(
+                  scorer, sidewaysCollector.getLeafCollector(context));
         }
 
-        // If more than one dim has no matches, then there
-        // are no hits nor drill-sideways counts.  Or, if we
-        // have only one dim and that dim has no matches,
-        // same thing.

Review Comment:
   Yeah, they're not :). We _do_ need to still run the query in the case where there's a single dim but it's null because we need to collect all the "sideways" hits for that null dim. Looks like this comment (along with the commented out condition) came from @mikemccand back in 2014. I doubt he recalls much 9 years later. Maybe he realized this is a bug, commented out the condition (and replaced it with the correct one) and left the comment untouched? That's my best guess. But no need to keep the comment around anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on code in PR #12853:
URL: https://github.com/apache/lucene/pull/12853#discussion_r1409693152


##########
lucene/CHANGES.txt:
##########
@@ -112,6 +112,9 @@ Bug Fixes
 
 * GITHUB#12220: Hunspell: disallow hidden title-case entries from compound middle/end
 
+* GITHUB#12558: Ensure #finish is called on all drill-sideways FacetsCollectors even when no hits are scored.

Review Comment:
   Put this under 10.0 for now. After 9.9, if we add a section for 9.10 I will back port it there (no reason not to, we just don't have the section setup). Also, if 9.9 re-spins I will back port to 9.9, but we shouldn't block that release for this fix IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gautamworah96 (via GitHub)" <gi...@apache.org>.
gautamworah96 commented on code in PR #12853:
URL: https://github.com/apache/lucene/pull/12853#discussion_r1421077089


##########
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##########
@@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
           FacetsCollector sidewaysCollector = drillSidewaysCollectorManagers[dim].newCollector();
           sidewaysCollectors[dim] = sidewaysCollector;
 
-          dims[dim] = new DrillSidewaysScorer.DocsAndCost(scorer, sidewaysCollector);
+          dims[dim] =
+              new DrillSidewaysScorer.DocsAndCost(
+                  scorer, sidewaysCollector.getLeafCollector(context));
         }
 
-        // If more than one dim has no matches, then there
-        // are no hits nor drill-sideways counts.  Or, if we
-        // have only one dim and that dim has no matches,
-        // same thing.

Review Comment:
   unrelated but how exactly is `(nullCount > 1 || (nullCount == 1 && dims.length == 1))` equivalent to `nullCount > 1`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on PR #12853:
URL: https://github.com/apache/lucene/pull/12853#issuecomment-1847976009

   Thanks @gautamworah96  for taking a look!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on code in PR #12853:
URL: https://github.com/apache/lucene/pull/12853#discussion_r1409693152


##########
lucene/CHANGES.txt:
##########
@@ -112,6 +112,9 @@ Bug Fixes
 
 * GITHUB#12220: Hunspell: disallow hidden title-case entries from compound middle/end
 
+* GITHUB#12558: Ensure #finish is called on all drill-sideways FacetsCollectors even when no hits are scored.

Review Comment:
   Put this under 10.0 for now. After 9.9, if we add a section for 9.10 I will back port it there (no reason not to, we just don't have the section setup). Also, if 9.9 re-spins I will back port to 9.9, but we shouldn't block that release for this fix IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gautamworah96 (via GitHub)" <gi...@apache.org>.
gautamworah96 commented on code in PR #12853:
URL: https://github.com/apache/lucene/pull/12853#discussion_r1421069686


##########
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##########
@@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
           FacetsCollector sidewaysCollector = drillSidewaysCollectorManagers[dim].newCollector();
           sidewaysCollectors[dim] = sidewaysCollector;
 
-          dims[dim] = new DrillSidewaysScorer.DocsAndCost(scorer, sidewaysCollector);
+          dims[dim] =
+              new DrillSidewaysScorer.DocsAndCost(
+                  scorer, sidewaysCollector.getLeafCollector(context));
         }
 
-        // If more than one dim has no matches, then there
-        // are no hits nor drill-sideways counts.  Or, if we
-        // have only one dim and that dim has no matches,
-        // same thing.
-        // if (nullCount > 1 || (nullCount == 1 && dims.length == 1)) {
-        if (nullCount > 1) {
+        // If baseScorer is null or the dim nullCount > 1, then we have nothing to score. We return
+        // a null scorer in this case, but we need to make sure #finish gets called on all facet
+        // collectors since IndexSearcher won't handle this for us:
+        if (baseScorer == null || nullCount > 1) {
+          if (drillDownCollector != null) {
+            drillDownCollector.finish();
+          }
+          for (FacetsCollector fc : sidewaysCollectors) {
+            fc.finish();
+          }
           return null;
         }
 
         // Sort drill-downs by most restrictive first:
-        Arrays.sort(
-            dims,
-            new Comparator<DrillSidewaysScorer.DocsAndCost>() {
-              @Override
-              public int compare(DocsAndCost o1, DocsAndCost o2) {
-                return Long.compare(o1.approximation.cost(), o2.approximation.cost());
-              }
-            });
-
-        if (baseScorer == null) {
-          return null;
-        }
-
-        FacetsCollector drillDownCollector;
-        if (drillDownCollectorManager != null) {
-          drillDownCollector = drillDownCollectorManager.newCollector();
-          managedDrillDownCollectors.add(drillDownCollector);
-        } else {
-          drillDownCollector = null;
-        }
+        Arrays.sort(dims, Comparator.comparingLong(o -> o.approximation.cost()));

Review Comment:
   Thanks for these small cleanups as well. LGTM



##########
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##########
@@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
           FacetsCollector sidewaysCollector = drillSidewaysCollectorManagers[dim].newCollector();
           sidewaysCollectors[dim] = sidewaysCollector;
 
-          dims[dim] = new DrillSidewaysScorer.DocsAndCost(scorer, sidewaysCollector);
+          dims[dim] =
+              new DrillSidewaysScorer.DocsAndCost(
+                  scorer, sidewaysCollector.getLeafCollector(context));
         }
 
-        // If more than one dim has no matches, then there
-        // are no hits nor drill-sideways counts.  Or, if we
-        // have only one dim and that dim has no matches,
-        // same thing.

Review Comment:
   unlrelated but how exactly is `(nullCount > 1 || (nullCount == 1 && dims.length == 1))` equivalent to `nullCount > 1`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller merged PR #12853:
URL: https://github.com/apache/lucene/pull/12853


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org