You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/09/04 11:50:42 UTC

[GitHub] [lucene] gsmiller commented on a change in pull request #282: Lucene-10070

gsmiller commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r702274019



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##########
@@ -159,13 +160,15 @@ private FacetResult getDim(String dim, OrdRange ordRange, int topN) throws IOExc
     final MatchingDocs hits;
     final OrdinalMap ordinalMap;
     final int segOrd;
+    final Bits liveDocs;
 
     public CountOneSegment(
         LeafReader leafReader, MatchingDocs hits, OrdinalMap ordinalMap, int segOrd) {
       this.leafReader = leafReader;
       this.hits = hits;
       this.ordinalMap = ordinalMap;
       this.segOrd = segOrd;
+      this.liveDocs = (leafReader != null) ? leafReader.getLiveDocs() : null;

Review comment:
       I wonder if it makes sense to just `assert leafReader != null` at the top of this ctor? If it's `null`, when `call()` gets invoked it will throw an NPE anyway (down in `DocValues.getSortedSet`), so might be nice to fail early on this. What do you think?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -152,7 +153,8 @@ private FacetResult getDim(String dim, OrdRange ordRange, int topN) throws IOExc
   }
 
   private void countOneSegment(
-      OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits) throws IOException {
+      OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits, Bits liveDocs)

Review comment:
       I left a similar comment in the concurrent version of this, but I think we should specialize the count-all case so we're not checking `liveDocs` in the common case where `hits` is non-null (since `hits` will already do that check essentially, and should only be providing live docs).

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##########
@@ -207,11 +210,17 @@ public Void call() throws IOException {
           // Remap every ord to global ord as we iterate:
           if (singleValues != null) {
             for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) {
+              if (liveDocs != null && liveDocs.get(doc) == false) {

Review comment:
       Hmm, this is a little tricky. We only need to check `liveDocs` if `hits` is `null` (i.e., we're counting "all" docs). If `hits` is non-null, then we know it's only providing "live" docs already, so we don't need to re-check. Other `Facets` implementations actually provide a separate implementation / code-path for the "count all" case to specialize, and I wonder if we should do the same here. The code will end up being fairly duplicated (like elsewhere), but it would be more efficient for the common case where `hits` is non-null.

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java
##########
@@ -272,7 +273,8 @@ private void count(FacetsCollector facetsCollector) throws IOException {
       // Assuming the state is valid, ordinalMap should be null since we have one segment:
       assert ordinalMap == null;
 
-      countOneSegment(docValues, hits.context.ord, hits);
+      // hits contain live documents only, no need to pass live docs bitset explicitly
+      countOneSegment(docValues, hits.context.ord, hits, null);

Review comment:
       Similar comment to what I left in `ConcurrentSortedSetDocValueFacetCounts` about whether-or-not it makes sense to split out a separate implementation for the count-all case. I think we should even though there might be significant code duplication. Checking liveness isn't necessary when `hits` is non-null, and will just add overhead to the common case. Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org