You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/09/14 21:19:03 UTC

[GitHub] [lucene] gsmiller commented on a change in pull request #293: Lucene-10070: Skip deleted documents during facet counting for all do…

gsmiller commented on a change in pull request #293:
URL: https://github.com/apache/lucene/pull/293#discussion_r708632754



##########
File path: lucene/core/src/java/org/apache/lucene/search/ConjunctionUtils.java
##########
@@ -97,4 +99,46 @@ public static void addIterator(
       List<TwoPhaseIterator> twoPhaseIterators) {
     ConjunctionDISI.addIterator(disi, allIterators, twoPhaseIterators);
   }
+
+  /**
+   * Wrap the given DocIdSetIterator and liveDocs into another DocIdSetIterator that returns
+   * non-deleted documents during iteration. This is useful for match-all style queries that need to
+   * exclude deleted documents.
+   *
+   * @param it {@link DocIdSetIterator} being wrapped
+   * @param liveDocs {@link Bits} containing set bits for non-deleted docs
+   * @return wrapped iterator
+   */
+  public static DocIdSetIterator liveDocsDISI(DocIdSetIterator it, Bits liveDocs) {

Review comment:
       Hmm, a couple thoughts here.
   1. I really like this approach of creating a DISI that accounts for deleted docs! Clean way of solving this.
   2. I wonder if this is general-purpose enough to expose here, or if we should keep it just for faceting for now (e.g., in a pkg-private class within the faceting module). Might be worth waiting to expose it in a public API until there are more specific use-cases to justify it.
   3. Could you take a look at the way `ConjunctionDISI` implements next/advance? I think it might be a good pattern to follow in this implementation, specifically the [doNext](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java#L166) pattern. I'm not 100% sure, but I _think_ this might explode if you called advance for a doc that's not present, that sends you to NO_MORE_DOCS. (Generalizing this in a public API will require rigorous testing and handling use-cases that may not come up in faceting).

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -152,7 +153,8 @@ private FacetResult getDim(String dim, OrdRange ordRange, int topN) throws IOExc
   }
 
   private void countOneSegment(
-      OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits) throws IOException {

Review comment:
       I like how you were able to extend this functionality to work for both the "count all" and "typical" counting cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org