You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/22 02:04:33 UTC

[GitHub] [druid] FrankChen021 commented on a diff in pull request #13133: Composite approach for checking in-filter values set in column dictionary

FrankChen021 commented on code in PR #13133:
URL: https://github.com/apache/druid/pull/13133#discussion_r977120670


##########
processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedStringIndexSupplier.java:
##########
@@ -280,15 +287,35 @@ public ImmutableBitmap next()
 
             private void findNext()
             {
-              while (next < 0 && iterator.hasNext()) {
-                ByteBuffer nextValue = iterator.next();
-                next = dictionary.indexOf(nextValue);
-
-                if (next == -dictionarySize - 1) {
-                  // nextValue is past the end of the dictionary.
-                  // Note: we can rely on indexOf returning (-(insertion point) - 1), even though Indexed doesn't
-                  // guarantee it, because "dictionary" comes from GenericIndexed singleThreaded().
-                  break;
+              // if the size of in-filter values is less than the threshold percentage of dictionary size, then use binary search
+              // based lookup per value. The algorithm works well for smaller number of values.
+              if (size < SORTED_MERGE_RATIO_THRESHOLD * dictionary.size()) {

Review Comment:
   We can determine the strategy at the point of `Iterator<ImmutableBitmap>` object is instantiated. 
   
   And it would be much better if we split the returned `Iterator<ImmutableBitmap>` into two inner classes, one is for the binary search, the other is for the sorted merge.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org