You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/05/05 06:41:48 UTC

[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #5331: Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high

fx19880617 commented on a change in pull request #5331:
URL: https://github.com/apache/incubator-pinot/pull/5331#discussion_r419894101



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RangePredicateEvaluatorFactory.java
##########
@@ -169,19 +171,59 @@ public int getNumMatchingDictIds() {
   }
 
   private static final class RealtimeDictionaryBasedRangePredicateEvaluator extends BaseDictionaryBasedPredicateEvaluator {
+    // When the cardinality of the column is lower than this threshold, pre-calculate the matching dictionary ids;
+    // otherwise, fetch the value when evaluating each dictionary id.
+    // TODO: Tune this threshold
+    private static final int DICT_ID_SET_BASED_CARDINALITY_THRESHOLD = 1000;
+
+    final BaseMutableDictionary _dictionary;
+    final DataType _dataType;
+    final boolean _dictIdSetBased;
     final IntSet _matchingDictIdSet;
-    final int _numMatchingDictIds;
-    int[] _matchingDictIds;
-
-    RealtimeDictionaryBasedRangePredicateEvaluator(RangePredicate rangePredicate, BaseMutableDictionary dictionary) {
-      _matchingDictIdSet = dictionary
-          .getDictIdsInRange(rangePredicate.getLowerBoundary(), rangePredicate.getUpperBoundary(),
-              rangePredicate.includeLowerBoundary(), rangePredicate.includeUpperBoundary());
-      _numMatchingDictIds = _matchingDictIdSet.size();
-      if (_numMatchingDictIds == 0) {
-        _alwaysFalse = true;
-      } else if (_numMatchingDictIds == dictionary.length()) {
-        _alwaysTrue = true;
+    final BaseRawValueBasedPredicateEvaluator _rawValueBasedEvaluator;
+
+    RealtimeDictionaryBasedRangePredicateEvaluator(RangePredicate rangePredicate, BaseMutableDictionary dictionary,
+        DataType dataType) {
+      _dictionary = dictionary;
+      _dataType = dataType;
+      int cardinality = dictionary.length();
+      if (cardinality < DICT_ID_SET_BASED_CARDINALITY_THRESHOLD) {
+        _dictIdSetBased = true;
+        _rawValueBasedEvaluator = null;
+        _matchingDictIdSet = dictionary
+            .getDictIdsInRange(rangePredicate.getLowerBoundary(), rangePredicate.getUpperBoundary(),
+                rangePredicate.includeLowerBoundary(), rangePredicate.includeUpperBoundary());
+        int numMatchingDictIds = _matchingDictIdSet.size();
+        if (numMatchingDictIds == 0) {
+          _alwaysFalse = true;
+        } else if (numMatchingDictIds == cardinality) {
+          _alwaysTrue = true;
+        }
+      } else {
+        _dictIdSetBased = false;
+        _matchingDictIdSet = null;
+        switch (dataType) {

Review comment:
       Not related to this PR, just shall we start thinking of how to simplify those switch cases code blocks?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org