You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/09/28 22:20:33 UTC

[GitHub] [lucene] gautamworah96 commented on a change in pull request #304: LUCENE 10080 Added FixedBitSet for one counts when counting taxonomy facet labels

gautamworah96 commented on a change in pull request #304:
URL: https://github.com/apache/lucene/pull/304#discussion_r718007188



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##########
@@ -43,7 +47,7 @@ protected IntTaxonomyFacets(
     super(indexFieldName, taxoReader, config);
 
     if (useHashTable(fc, taxoReader)) {
-      sparseValues = new IntIntHashMap();
+      sparseValues = new IntIntHashMapWithFixedBitSet(taxoReader.getSize());

Review comment:
       I think we can definitely experiment with the conditions under which this logic is triggered. For example, I can imagine cases where there are a lot of matches (say > 10% of the index, which triggers the code to use the `values` array) and which has a lot of sparse ordinals i.e, ordinals with a count of just 0.
   
   In that case we could benefit from an optimization that creates a large bitset with most values set to 0 instead of a large `values[int]` array with most values set to 0.
   
    Another idea might be to remove both these `hashtable` and `values[]` array ideas and keep just the bitset backed by a hashtable idea (this PR). 
   
   Maybe you could have a `wikimedium10m` run with a small print statement that records whether the class used a sparse or a dense data-structure (and then get the counts).  Based on those results we could then decide on where to add this logic

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##########
@@ -253,4 +257,92 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
 
     return new FacetResult(dim, path, totValue, labelValues, childCount);
   }
+
+  /**
+   * Class that uses FixedBitSet to store counts for all ordinals with 1 count and IntIntHashMap for
+   * all other counts
+   */
+  protected static class IntIntHashMapWithFixedBitSet implements Iterable<IntIntCursor> {

Review comment:
       +1 there is a similar class called `FloatTaxonomyFacets` but it does not currently use this sparse counting mechanism. If we see some performance gains from this approach, we could definitively extend it to that class as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org