You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/06/28 01:00:12 UTC

[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

gsmiller commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r907916354


##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException
     return new FacetResult(dim, path, totCount, labelValues, labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException {
     validateTopN(topN);
-    return getAllChildren(dim, path);
+    validateDimAndPathForGetChildren(dim, path);
+
+    int resultSize = Math.min(topN, counts.length);
+    PriorityQueue<LabelAndValue> pq =
+        new PriorityQueue<>(resultSize) {
+          @Override
+          protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+            int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+            if (cmp == 0) {
+              cmp = b.label.compareTo(a.label);
+            }
+            return cmp < 0;
+          }
+        };
+
+    for (int i = 0; i < counts.length; i++) {
+      if (pq.size() < resultSize) {
+        pq.add(new LabelAndValue(ranges[i].label, counts[i]));

Review Comment:
   I wonder if we should only add to the pq when the count is > 0 to be consistent with other Facet implementations. What do you think?



##########
lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java:
##########
@@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery(
   }
 
   /** User runs a query and counts facets. */
-  public FacetResult search() throws IOException {
+  public FacetResult searchAllChildren() throws IOException {
+
+    FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new FacetsCollectorManager());
+
+    Facets facets =
+        new DoubleRangeFacetCounts(
+            "field",
+            getDistanceValueSource(),
+            fc,
+            getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0),
+            ONE_KM,
+            TWO_KM,
+            FIVE_KM,
+            TEN_KM);
+
+    return facets.getAllChildren("field");
+  }
+
+  /** User runs a query and counts facets. */
+  public FacetResult searchTopChildren() throws IOException {

Review Comment:
   I'm not totally sold we need to demo the `getTopChildren` functionality. It feels like it will be a little obscure for range faceting to me. What do you think of just changing the existing example code in-place to use `getAllChildren` instead of `getTopChildren` since that's probably the more common use-case? Curious what you think though. Do you think we should demo `getTopChildren` as well?



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException
     return new FacetResult(dim, path, totCount, labelValues, labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException {
     validateTopN(topN);
-    return getAllChildren(dim, path);
+    validateDimAndPathForGetChildren(dim, path);
+
+    int resultSize = Math.min(topN, counts.length);
+    PriorityQueue<LabelAndValue> pq =
+        new PriorityQueue<>(resultSize) {
+          @Override
+          protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+            int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+            if (cmp == 0) {
+              cmp = b.label.compareTo(a.label);
+            }
+            return cmp < 0;
+          }
+        };
+
+    for (int i = 0; i < counts.length; i++) {
+      if (pq.size() < resultSize) {
+        pq.add(new LabelAndValue(ranges[i].label, counts[i]));
+      } else {
+        int topValue = pq.top().value.intValue();
+        if (counts[i] > topValue
+            || (counts[i] == topValue && ranges[i].label.compareTo(pq.top().label) < 0)) {
+          pq.updateTop(new LabelAndValue(ranges[i].label, counts[i]));
+        }
+      }

Review Comment:
   Ideally, you'd do something like this here to avoid creating unnecessary garbage:
   
   ```
   } else {
           LabelAndValue top = pq.top();
           assert top != null;
           int topValue = top.value.intValue();
           if (counts[i] > topValue || (counts[i] == topValue && ranges[i].label.compareTo(top.label) < 0)) {
             top.label = ranges[i].label;
             top.value = counts[i];
             pq.updateTop();
           }
         }
   ```
   
   ... but of course LabelAndValue is immutable. The way we do these elsewhere is to use some internal class to just store the pq entries, and then translate into `LabelAndValue` instances at the return. Have a look at `LongValueFacetCount#getTopChildren` as an example.
   
   I would suggest following this similar approach for consistency here. Of course, this might actually create more garbage if top-n and the total number of ranges are very similar. So there's probably an optimization opportunity here where we implement this differently depending on the ratio of top-n to the total number of ranges (but I would save that for a separate issue/exploration).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org