You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2019/06/21 01:27:55 UTC

[GitHub] [incubator-pinot] sunithabeeram commented on a change in pull request #4350: For RANGE and REGEXP operators, if there is single matching dictionary id, use inverted index

sunithabeeram commented on a change in pull request #4350: For RANGE and REGEXP operators, if there is single matching dictionary id, use inverted index
URL: https://github.com/apache/incubator-pinot/pull/4350#discussion_r296067652
 
 

 ##########
 File path: pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
 ##########
 @@ -53,18 +53,25 @@ public static BaseFilterOperator getLeafFilterOperator(PredicateEvaluator predic
     // TODO: make it exclusive
     int endDocId = numDocs - 1;
 
-    // Use inverted index if the predicate type is not RANGE or REGEXP_LIKE for efficiency
+    // Use scan-based operator if inverted index does not exist or the predicate type is RANGE or REGEXP_LIKE with more
+    // than 1 matching dictionary ids
+    // NOTE: allow RANGE with single matching dictionary id to use inverted index is very useful for time column with
+    //       DAYS granularity and time values across two days
+    // TODO: whether to use inverted index should be based on the number of matching dictionary ids and cardinality of
+    //       the column instead of the predicate type
+    // TODO: if column is sorted, should always use sorted index for RANGE predicate
     DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
     Predicate.Type predicateType = predicateEvaluator.getPredicateType();
-    if (dataSourceMetadata.hasInvertedIndex() && (predicateType != Predicate.Type.RANGE) && (predicateType
-        != Predicate.Type.REGEXP_LIKE)) {
+    if (!dataSourceMetadata.hasInvertedIndex() || (
+        (predicateType == Predicate.Type.RANGE || predicateType == Predicate.Type.REGEXP_LIKE)
+            && predicateEvaluator.getNumMatchingDictIds() > 1)) {
 
 Review comment:
   I think this can be higher (atleast for range) even for this version. We currently support IN clause for a LOT of values and after the roaring bit-map upgrade, the performance penalty is much lower. I would atleast make this configurable so that it can be updated easily (perhaps via an external or internal tuner).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org