You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/20 23:27:31 UTC

[GitHub] [lucene] Yuti-G opened a new pull request, #914: LUCENE-10550: Add getAllChildren functionality to facets

Yuti-G opened a new pull request, #914:
URL: https://github.com/apache/lucene/pull/914

   # Description
   
   This new API returns all children for a specified path and allows users to sort in their desired criteria
   
   # Solution
   
   * Added getAllChildren function in the Facets class.
   * Override getAllChildren in subclasses
   
   # Tests
   
   Added new testing for all the overridden implementations of getAllChildren in Facets
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability.
   - [X] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r899773538


##########
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java:
##########
@@ -346,6 +346,43 @@ private void increment(long value) {
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+
+    List<LabelAndValue> labelValues = new ArrayList<>();
+    boolean countsAdded = false;
+    if (hashCounts.size() != 0) {
+      for (LongIntCursor c : hashCounts) {
+        int count = c.value;
+        if (count != 0) {
+          if (countsAdded == false && c.key >= counts.length) {
+            countsAdded = true;
+            appendCounts(labelValues);
+          }
+          labelValues.add(new LabelAndValue(Long.toString(c.key), count));
+        }
+      }
+    }
+
+    if (countsAdded == false) {
+      appendCounts(labelValues);
+    }
+
+    return new FacetResult(
+        field,
+        new String[0],
+        totCount,
+        labelValues.toArray(new LabelAndValue[0]),
+        labelValues.size());

Review Comment:
   Thank you so much for providing a simplified logic here!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on PR #914:
URL: https://github.com/apache/lucene/pull/914#issuecomment-1162454475

   Thank you so much for the last check! I added more javadoc and a new entry to the CHANGES.txt. For back-porting, should I wait until this PR is merged and checkout a new branch against the latest branch_9x to cherrypick the merged commit? Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427716


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;
+
+    List<Integer> ordinals = new ArrayList<>();
+    List<Integer> ordValues = new ArrayList<>();
+
+    if (sparseValues != null) {
+      for (IntIntCursor c : sparseValues) {
+        int value = c.value;
+        int ord = c.key;
+        if (parents[ord] == dimOrd && value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+      }
+    } else {
+      int[] children = getChildren();
+      int[] siblings = getSiblings();
+      int ord = children[dimOrd];
+      while (ord != TaxonomyReader.INVALID_ORDINAL) {
+        int value = values[ord];
+        if (value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+        ord = siblings[ord];
+      }
+    }
+
+    if (aggregatedValue == 0) {
+      return null;
+    }
+
+    if (dimConfig.multiValued) {
+      if (dimConfig.requireDimCount) {
+        aggregatedValue = getValue(dimOrd);
+      } else {
+        // Our sum'd value is not correct, in general:
+        aggregatedValue = -1;
+      }
+    } else {
+      // Our sum'd dim value is accurate, so we keep it
+    }
+
+    int[] ordinalArray = new int[ordinals.size()];
+    for (int i = 0; i < ordinals.size(); i++) {
+      ordinalArray[i] = ordinals.get(i);
+    }

Review Comment:
   `getBulkPath` only takes in int array and I would need to cast `List<Integer>` to `int[]` here. Please advise me if there is a cleaner way to do so. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427716


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;
+
+    List<Integer> ordinals = new ArrayList<>();
+    List<Integer> ordValues = new ArrayList<>();
+
+    if (sparseValues != null) {
+      for (IntIntCursor c : sparseValues) {
+        int value = c.value;
+        int ord = c.key;
+        if (parents[ord] == dimOrd && value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+      }
+    } else {
+      int[] children = getChildren();
+      int[] siblings = getSiblings();
+      int ord = children[dimOrd];
+      while (ord != TaxonomyReader.INVALID_ORDINAL) {
+        int value = values[ord];
+        if (value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+        ord = siblings[ord];
+      }
+    }
+
+    if (aggregatedValue == 0) {
+      return null;
+    }
+
+    if (dimConfig.multiValued) {
+      if (dimConfig.requireDimCount) {
+        aggregatedValue = getValue(dimOrd);
+      } else {
+        // Our sum'd value is not correct, in general:
+        aggregatedValue = -1;
+      }
+    } else {
+      // Our sum'd dim value is accurate, so we keep it
+    }
+
+    int[] ordinalArray = new int[ordinals.size()];
+    for (int i = 0; i < ordinals.size(); i++) {
+      ordinalArray[i] = ordinals.get(i);
+    }

Review Comment:
   `getBulkPath` only takes in int array and I would need to cast `List<Integer>` to `int[]` here, and therefore, I can't just use ordinals directly for getBulkPath. Please advise me if there is a cleaner way to do so. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427946


##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##########
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
     return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+    if (dimConfig.hierarchical) {
+      int pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd < 0) {
+        // path was never indexed
+        return null;
+      }
+      SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);
+      return getPathResult(dimConfig, dim, path, pathOrd, dimTree.iterator(pathOrd));
+    } else {
+      if (path.length > 0) {
+        throw new IllegalArgumentException(
+            "Field is not configured as hierarchical, path should be 0 length");
+      }
+      OrdRange ordRange = state.getOrdRange(dim);
+      if (ordRange == null) {
+        // means dimension was never indexed
+        return null;
+      }
+      int dimOrd = ordRange.start;
+      PrimitiveIterator.OfInt childIt = ordRange.iterator();
+      if (dimConfig.multiValued && dimConfig.requireDimCount) {
+        // If the dim is multi-valued and requires dim counts, we know we've explicitly indexed
+        // the dimension and we need to skip past it so the iterator is positioned on the first
+        // child:
+        childIt.next();
+      }
+      return getPathResult(dimConfig, dim, null, dimOrd, childIt);
+    }
+  }

Review Comment:
   Thanks for the suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427716


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;
+
+    List<Integer> ordinals = new ArrayList<>();
+    List<Integer> ordValues = new ArrayList<>();
+
+    if (sparseValues != null) {
+      for (IntIntCursor c : sparseValues) {
+        int value = c.value;
+        int ord = c.key;
+        if (parents[ord] == dimOrd && value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+      }
+    } else {
+      int[] children = getChildren();
+      int[] siblings = getSiblings();
+      int ord = children[dimOrd];
+      while (ord != TaxonomyReader.INVALID_ORDINAL) {
+        int value = values[ord];
+        if (value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+        ord = siblings[ord];
+      }
+    }
+
+    if (aggregatedValue == 0) {
+      return null;
+    }
+
+    if (dimConfig.multiValued) {
+      if (dimConfig.requireDimCount) {
+        aggregatedValue = getValue(dimOrd);
+      } else {
+        // Our sum'd value is not correct, in general:
+        aggregatedValue = -1;
+      }
+    } else {
+      // Our sum'd dim value is accurate, so we keep it
+    }
+
+    int[] ordinalArray = new int[ordinals.size()];
+    for (int i = 0; i < ordinals.size(); i++) {
+      ordinalArray[i] = ordinals.get(i);
+    }

Review Comment:
   `getBulkPath` only takes in int array and I would need to cast List<Integer> to int[] here. Please advise me if there is a cleaner way to do so. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #914:
URL: https://github.com/apache/lucene/pull/914#issuecomment-1162469235

   > For back-porting, should I wait until this PR is merged and checkout a new branch against the latest branch_9x to cherrypick the merged commit? Thanks again!
   
   Exactly! Then you can open a PR with that branch against `origin/branch_9x` (github will automatically select `origin/main` as the suggested destination so just change that). And just mention that it's a backport PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #914:
URL: https://github.com/apache/lucene/pull/914#issuecomment-1162469491

   Merged onto `main`. Thanks again @Yuti-G! Exciting to see this new functionality available :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on PR #914:
URL: https://github.com/apache/lucene/pull/914#issuecomment-1159161102

   Thanks @gsmiller for spending time reviewing my PR and leaving the great feedback! I addressed all of the comments except for the one that requires casting `List<Integer>` to `int[]` for `ordinals`  to call `getBulkPath` in the IntTaxonomy and FloatTaxonomy. Please let me know if there is a better way to do it. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller merged pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
gsmiller merged PR #914:
URL: https://github.com/apache/lucene/pull/914


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r903092231


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;
+
+    List<Integer> ordinals = new ArrayList<>();
+    List<Integer> ordValues = new ArrayList<>();
+
+    if (sparseValues != null) {
+      for (IntIntCursor c : sparseValues) {
+        int value = c.value;
+        int ord = c.key;
+        if (parents[ord] == dimOrd && value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+      }
+    } else {
+      int[] children = getChildren();
+      int[] siblings = getSiblings();
+      int ord = children[dimOrd];
+      while (ord != TaxonomyReader.INVALID_ORDINAL) {
+        int value = values[ord];
+        if (value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+        ord = siblings[ord];
+      }
+    }
+
+    if (aggregatedValue == 0) {
+      return null;
+    }
+
+    if (dimConfig.multiValued) {
+      if (dimConfig.requireDimCount) {
+        aggregatedValue = getValue(dimOrd);
+      } else {
+        // Our sum'd value is not correct, in general:
+        aggregatedValue = -1;
+      }
+    } else {
+      // Our sum'd dim value is accurate, so we keep it
+    }
+
+    int[] ordinalArray = new int[ordinals.size()];
+    for (int i = 0; i < ordinals.size(); i++) {
+      ordinalArray[i] = ordinals.get(i);
+    }

Review Comment:
   Ah, I see. Shoot. It bugs me that we need to copy these ordinals from a list to an array just to do this bulk path lookup, but I see what you're saying. It would be nice if `TaxonomyReader` could directly support `List` in addition to an array, but I don't think this use-case justifies trying to add that right now. Would you mind adding a `TODO` comment here to mention that it would be nice if we didn't need to do this copy just to look up bulk paths? We can leave it at that for now and optimize later if/as necessary. Thanks for pointing this out!



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##########
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
     return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+    if (dimConfig.hierarchical) {
+      int pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd < 0) {
+        // path was never indexed
+        return null;
+      }
+      SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);
+      return getPathResult(dimConfig, dim, path, pathOrd, dimTree.iterator(pathOrd));
+    } else {
+      if (path.length > 0) {
+        throw new IllegalArgumentException(
+            "Field is not configured as hierarchical, path should be 0 length");
+      }
+      OrdRange ordRange = state.getOrdRange(dim);
+      if (ordRange == null) {
+        // means dimension was never indexed
+        return null;
+      }
+      int dimOrd = ordRange.start;
+      PrimitiveIterator.OfInt childIt = ordRange.iterator();
+      if (dimConfig.multiValued && dimConfig.requireDimCount) {
+        // If the dim is multi-valued and requires dim counts, we know we've explicitly indexed
+        // the dimension and we need to skip past it so the iterator is positioned on the first
+        // child:
+        childIt.next();
+      }
+      return getPathResult(dimConfig, dim, null, dimOrd, childIt);
+    }
+  }

Review Comment:
   Of course! There was a lot of change happening while you were working on this, so I'm sure you were working against an earlier version and just didn't notice some of the change to getTopChildren. Happy to point them out.



##########
lucene/facet/src/java/org/apache/lucene/facet/Facets.java:
##########
@@ -29,6 +29,12 @@ public abstract class Facets {
   /** Default constructor. */
   public Facets() {}
 
+  /**
+   * Returns all the children labels with non-zero counts under the specified path in the unsorted
+   * order. Returns null if the specified path doesn't exist or if this dimension was never seen.
+   */

Review Comment:
   Sorry to be picky about this, but since it's part of the externally facing API/javadoc, I think it's important to scrutinize a little bit. What do you think of:
   
   ```suggestion
     /**
      * Returns all child labels with non-zero counts under the specified path. Users should make no 
      * assumptions about ordering of the children. Returns null if the specified path doesn't exist or 
      * if this dimension was never seen.
      */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r896240300


##########
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java:
##########
@@ -346,6 +346,43 @@ private void increment(long value) {
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }

Review Comment:
   minor: There's some common validation logic between this and `getTopChildren` you could factor out into a common helper method if you thought it made sense.



##########
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java:
##########
@@ -346,6 +346,43 @@ private void increment(long value) {
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+
+    List<LabelAndValue> labelValues = new ArrayList<>();
+    boolean countsAdded = false;
+    if (hashCounts.size() != 0) {
+      for (LongIntCursor c : hashCounts) {
+        int count = c.value;
+        if (count != 0) {
+          if (countsAdded == false && c.key >= counts.length) {
+            countsAdded = true;
+            appendCounts(labelValues);
+          }
+          labelValues.add(new LabelAndValue(Long.toString(c.key), count));
+        }
+      }
+    }
+
+    if (countsAdded == false) {
+      appendCounts(labelValues);
+    }
+
+    return new FacetResult(
+        field,
+        new String[0],
+        totCount,
+        labelValues.toArray(new LabelAndValue[0]),
+        labelValues.size());

Review Comment:
   It looks like you're trying to maintain value-sort-order, sort of like what `getAllChildrenSortByValue` is doing, but since we don't make any ordering guarantees, I think we can simplify this a little bit. What do you think of doing something like this?
   
   ```suggestion
       List<LabelAndValue> labelValues = new ArrayList<>();
       for (int i = 0; i < counts.length; i++) {
         if (counts[i] != 0) {
           labelValues.add(new LabelAndValue(Long.toString(i), counts[i]));
         }
       }
   
       if (hashCounts.size() != 0) {
         for (LongIntCursor c : hashCounts) {
           int count = c.value;
           if (count != 0) {
             labelValues.add(new LabelAndValue(Long.toString(c.key), c.value));
           }
         }
       }
   
       return new FacetResult(
           field,
           new String[0],
           totCount,
           labelValues.toArray(new LabelAndValue[0]),
           labelValues.size());
   ```



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java:
##########
@@ -101,6 +102,63 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return values[ord];
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {

Review Comment:
   Please see my feedback on `IntTaxonomyFacets`. It should all apply here as well. Thanks!



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##########
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
     return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+    if (dimConfig.hierarchical) {
+      int pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd < 0) {
+        // path was never indexed
+        return null;
+      }
+      SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);

Review Comment:
   As a general note, I think we usually prefer to _not_ fully qualify internal class names unless necessary (and prefer to import them directly instead). For example, we're already doing `import org.apache.lucene.facet.sortedset.SortedSetDocValuesReaderState.DimTree;`, so you can just say `DimTree` here instead of `SortedSetDocValuesReaderState.DimTree`. I'm betting your IDE is setup with this style, but something to just keep an eye on.



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##########
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
     return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+    if (dimConfig.hierarchical) {
+      int pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd < 0) {
+        // path was never indexed
+        return null;
+      }
+      SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);
+      return getPathResult(dimConfig, dim, path, pathOrd, dimTree.iterator(pathOrd));
+    } else {
+      if (path.length > 0) {
+        throw new IllegalArgumentException(
+            "Field is not configured as hierarchical, path should be 0 length");
+      }
+      OrdRange ordRange = state.getOrdRange(dim);
+      if (ordRange == null) {
+        // means dimension was never indexed
+        return null;
+      }
+      int dimOrd = ordRange.start;
+      PrimitiveIterator.OfInt childIt = ordRange.iterator();
+      if (dimConfig.multiValued && dimConfig.requireDimCount) {
+        // If the dim is multi-valued and requires dim counts, we know we've explicitly indexed
+        // the dimension and we need to skip past it so the iterator is positioned on the first
+        // child:
+        childIt.next();
+      }
+      return getPathResult(dimConfig, dim, null, dimOrd, childIt);
+    }
+  }

Review Comment:
   I think we can actually optimize this a little bit when looking up the dim ord and preparing the child iterator by borrowing some ideas from `getTopChildren`. Also, I don't think we need to break out the `getPathResult` method since we only call it from this method (and it's easy to collapse the invocation to a single place). So I'd suggest we inline that logic. Here's what I'm thinking. Hope you don't mind me taking a pass at this (and then remove `getPathResult` completely):
   
   ```suggestion
   @Override
     public FacetResult getAllChildren(String dim, String... path) throws IOException {
       FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
   
       // Determine the path ord and resolve an iterator to its immediate children. The logic for this
       // depends on whether-or-not the dimension is configured as hierarchical:
       final int pathOrd;
       final PrimitiveIterator.OfInt childIterator;
       if (dimConfig.hierarchical) {
         DimTree dimTree = state.getDimTree(dim);
         if (path.length > 0) {
           pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path)));
         } else {
           // If there's no path, this is a little more efficient to just look up the dim:
           pathOrd = dimTree.dimStartOrd;
         }
         if (pathOrd < 0) {
           // path was never indexed
           return null;
         }
         childIterator = dimTree.iterator(pathOrd);
       } else {
         if (path.length > 0) {
           throw new IllegalArgumentException(
             "Field is not configured as hierarchical, path should be 0 length");
         }
         OrdRange ordRange = state.getOrdRange(dim);
         if (ordRange == null) {
           // means dimension was never indexed
           return null;
         }
         pathOrd = ordRange.start;
         childIterator = ordRange.iterator();
         if (dimConfig.multiValued && dimConfig.requireDimCount) {
           // If the dim is multi-valued and requires dim counts, we know we've explicitly indexed
           // the dimension and we need to skip past it so the iterator is positioned on the first
           // child:
           childIterator.next();
         }
       }
   
       // Compute the actual results:
       int pathCount = 0;
   
       List<LabelAndValue> labelValues = new ArrayList<>();
       while (childIterator.hasNext()) {
         int ord = childIterator.next();
         int count = getCount(ord);
         if (count > 0) {
           pathCount += count;
           final BytesRef term = dv.lookupOrd(ord);
           String[] parts = FacetsConfig.stringToPath(term.utf8ToString());
           labelValues.add(new LabelAndValue(parts[parts.length - 1], count));
         }
       }
   
       if (dimConfig.hierarchical) {
         pathCount = getCount(pathOrd);
       } else {
         // see if pathCount is actually reliable or needs to be reset
         if (dimConfig.multiValued) {
           if (dimConfig.requireDimCount) {
             pathCount = getCount(pathOrd);
           } else {
             pathCount = -1; // pathCount is inaccurate at this point, so set it to -1
           }
         }
       }
   
       return new FacetResult(
         dim, path, pathCount, labelValues.toArray(new LabelAndValue[0]), labelValues.size());
     }
   ```



##########
lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java:
##########
@@ -130,6 +131,41 @@ public StringValueFacetCounts(StringDocValuesReaderState state, FacetsCollector
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+
+    int childCount = 0; // total number of labels with non-zero count

Review Comment:
   minor: Instead of keeping track of `childCound`, I think you can just reference `labelValues.size()` when you go to create `FacetResult` right?



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;

Review Comment:
   I don't think we need this local variable right? Can you just use `ordinals.size()` when you go to create the response?



##########
lucene/facet/src/java/org/apache/lucene/facet/Facets.java:
##########
@@ -29,6 +29,12 @@ public abstract class Facets {
   /** Default constructor. */
   public Facets() {}
 
+  /**
+   * Returns all the children labels under the specified path. Returns null if the specified path

Review Comment:
   Maybe be specific that it returns all child labels *with non-zero counts*? Also, I wonder if we should mention something about the lack of a specific sort order? I could see some users assuming it would sort by count, but we're intentionally not doing that right? So maybe a note that users should generally make no assumptions about the child sort order?



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java:
##########
@@ -381,7 +381,7 @@ public OrdRange getOrdRange(String dim) {
   public DimTree getDimTree(String dim) {
     if (config.getDimConfig(dim).hierarchical == false) {
       throw new UnsupportedOperationException(
-          "This opperation is only supported for hierarchical facets");
+          "This operation is only supported for hierarchical facets");

Review Comment:
   Nice catch :)



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;
+
+    List<Integer> ordinals = new ArrayList<>();
+    List<Integer> ordValues = new ArrayList<>();
+
+    if (sparseValues != null) {
+      for (IntIntCursor c : sparseValues) {
+        int value = c.value;
+        int ord = c.key;
+        if (parents[ord] == dimOrd && value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+      }
+    } else {
+      int[] children = getChildren();
+      int[] siblings = getSiblings();
+      int ord = children[dimOrd];
+      while (ord != TaxonomyReader.INVALID_ORDINAL) {
+        int value = values[ord];
+        if (value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+        ord = siblings[ord];
+      }
+    }
+
+    if (aggregatedValue == 0) {
+      return null;
+    }
+
+    if (dimConfig.multiValued) {
+      if (dimConfig.requireDimCount) {
+        aggregatedValue = getValue(dimOrd);
+      } else {
+        // Our sum'd value is not correct, in general:
+        aggregatedValue = -1;
+      }
+    } else {
+      // Our sum'd dim value is accurate, so we keep it
+    }
+
+    int[] ordinalArray = new int[ordinals.size()];
+    for (int i = 0; i < ordinals.size(); i++) {
+      ordinalArray[i] = ordinals.get(i);
+    }

Review Comment:
   Thanks for using `getBulkPath` below, but there's no need for this `ordinalArray` is there? Can't you just use `ordinals` directly for `getBulkPath` (and remove this code completely)?



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -216,6 +216,24 @@ protected void count(String field, List<FacetsCollector.MatchingDocs> matchingDo
     totCount -= missingCount;
   }
 
+  // Return range labels in the order that specified by the user
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+    LabelAndValue[] labelValues = new LabelAndValue[counts.length];
+    for (int i = 0; i < counts.length; i++) {
+      labelValues[i] = new LabelAndValue(ranges[i].label, counts[i]);
+    }
+    return new FacetResult(dim, path, totCount, labelValues, labelValues.length);
+  }
+
+  // TODO: fix getTopChildren in LUCENE-10538
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) {

Review Comment:
   Could you change `getTopChildren` to just call `getAllChildren` for now instead of having the copy/paste code? And maybe add a nice comment explaining what's going on?



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -216,6 +216,24 @@ protected void count(String field, List<FacetsCollector.MatchingDocs> matchingDo
     totCount -= missingCount;
   }
 
+  // Return range labels in the order that specified by the user

Review Comment:
   Could you use javadoc please? Maybe something like:
   
   ```suggestion
     /**
      * {@inheritDoc}
      * <p>
      * NOTE: This implementation guarantees that ranges will be returned in the order specified by
      * the user when calling the constructor.
      */
   ```



##########
lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java:
##########
@@ -130,6 +131,41 @@ public StringValueFacetCounts(StringDocValuesReaderState state, FacetsCollector
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {

Review Comment:
   minor: There some common validation logic shared with getTopChildren you could factor out if you think it makes sense.



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -216,6 +216,24 @@ protected void count(String field, List<FacetsCollector.MatchingDocs> matchingDo
     totCount -= missingCount;
   }
 
+  // Return range labels in the order that specified by the user
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+    LabelAndValue[] labelValues = new LabelAndValue[counts.length];
+    for (int i = 0; i < counts.length; i++) {
+      labelValues[i] = new LabelAndValue(ranges[i].label, counts[i]);
+    }
+    return new FacetResult(dim, path, totCount, labelValues, labelValues.length);
+  }
+
+  // TODO: fix getTopChildren in LUCENE-10538

Review Comment:
   I think LUCENE-10538 is a little overloaded, so I created a new issue to track the specific idea of creating a true `getTopChildren` implementation for range faceting. Could you reference LUCENE-10614 instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org