You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by ja...@apache.org on 2019/06/20 22:55:56 UTC
[incubator-pinot] 01/01: For RANGE and REGEXP operators, if there is single matching dictionary id, use inverted index

This is an automated email from the ASF dual-hosted git repository.

jackie pushed a commit to branch optimize_range
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 4e4cae03ca300af44309801e00d84774cd6460d9
Author: Jackie (Xiaotian) Jiang <xa...@linkedin.com>
AuthorDate: Thu Jun 20 15:54:09 2019 -0700

    For RANGE and REGEXP operators, if there is single matching dictionary id, use inverted index
    
    This is especially useful for time column with DAYS granularity and time values across two days
    This optimization can apply to most hybrid use cases
---
 .../pinot/core/operator/filter/FilterOperatorUtils.java | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java b/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
index bc46b2f..0483859 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
@@ -53,18 +53,25 @@ public class FilterOperatorUtils {
     // TODO: make it exclusive
     int endDocId = numDocs - 1;
 
-    // Use inverted index if the predicate type is not RANGE or REGEXP_LIKE for efficiency
+    // Use scan-based operator if inverted index does not exist or the predicate type is RANGE or REGEXP_LIKE with more
+    // than 1 matching dictionary ids
+    // NOTE: allow RANGE with single matching dictionary id to use inverted index is very useful for time column with
+    //       DAYS granularity and time values across two days
+    // TODO: whether to use inverted index should be based on the number of matching dictionary ids and cardinality of
+    //       the column instead of the predicate type
+    // TODO: if column is sorted, should always use sorted index for RANGE predicate
     DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
     Predicate.Type predicateType = predicateEvaluator.getPredicateType();
-    if (dataSourceMetadata.hasInvertedIndex() && (predicateType != Predicate.Type.RANGE) && (predicateType
-        != Predicate.Type.REGEXP_LIKE)) {
+    if (!dataSourceMetadata.hasInvertedIndex() || (
+        (predicateType == Predicate.Type.RANGE || predicateType == Predicate.Type.REGEXP_LIKE)
+            && predicateEvaluator.getNumMatchingDictIds() > 1)) {
+      return new ScanBasedFilterOperator(predicateEvaluator, dataSource, startDocId, endDocId);
+    } else {
       if (dataSourceMetadata.isSorted()) {
         return new SortedInvertedIndexBasedFilterOperator(predicateEvaluator, dataSource, startDocId, endDocId);
       } else {
         return new BitmapBasedFilterOperator(predicateEvaluator, dataSource, startDocId, endDocId);
       }
-    } else {
-      return new ScanBasedFilterOperator(predicateEvaluator, dataSource, startDocId, endDocId);
     }
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org