You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/29 19:21:46 UTC

[GitHub] [pinot] richardstartin opened a new pull request, #10043: evaluate EQ queries against range index when available

richardstartin opened a new pull request, #10043:
URL: https://github.com/apache/pinot/pull/10043

   This won't build yet because we haven't released the `RangeBitmap` changes yet, but this is here to build consensus that the changes to the query layer (letting `RangeIndexBasedFilterOperator` know about the `EqualityFilterOperator` interface) is appropriate. There's another way of implementing this with fewer code changes by translating an equality predicate to a range predicate `[value, value]`, but this feels less of a hack, or potentially introducing a new operator. I've extended the existing tests and have had a clean run locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1062985665


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {

Review Comment:
   This part can be simplified. We should be able to apply the range index when:
   1. Range index exist and is exact
   2. Column is dictionary encoded or has supported data types
   3. Predicate is EQ or RANGE (RANGE won't hit this method though)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/BitSlicedRangeIndexReader.java:
##########
@@ -94,6 +94,32 @@ public int getNumMatchingDocs(double min, double max) {
     return queryRangeBitmapCardinality(FPOrdering.ordinalOf(min), FPOrdering.ordinalOf(max), 0xFFFFFFFFFFFFFFFFL);
   }
 
+  @Override
+  public int getNumMatchingDocs(int value) {
+    if (value < _min) {
+      return 0;
+    }
+    return queryRangeBitmapCardinality(Math.max(value, _min) - _min, _max - _min);

Review Comment:
   Can be simplified, same for long type
   ```suggestion
       return queryRangeBitmapCardinality(value - _min, _max - _min);
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator.isDictionaryBased()
+                || SUPPORTED_RAW_DATA_TYPES.contains(predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact()
+                    && predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator;
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+          default:
+        }
+      }
+      return false;
+    }
+
     static RangeEvaluator of(RangeIndexReader<ImmutableRoaringBitmap> rangeIndexReader,
         PredicateEvaluator predicateEvaluator) {
-      if (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator) {
-        return new IntRangeEvaluator(rangeIndexReader,
-            (SortedDictionaryBasedRangePredicateEvaluator) predicateEvaluator);
+      if (predicateEvaluator.isDictionaryBased()) {
+        if (predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) {
+          return new IntPointEvaluator(rangeIndexReader,
+                  (EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) predicateEvaluator);

Review Comment:
   Can we follow the same way as how we extract the predicate value from the range predicate evaluator: make the child class public, and add a method getValue() or getDictId(). I don't see much value added by this new interface, and tend to keep the handling consistent



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator.isDictionaryBased()
+                || SUPPORTED_RAW_DATA_TYPES.contains(predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact()
+                    && predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator;
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+          default:
+        }
+      }
+      return false;
+    }
+
     static RangeEvaluator of(RangeIndexReader<ImmutableRoaringBitmap> rangeIndexReader,
         PredicateEvaluator predicateEvaluator) {
-      if (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator) {
-        return new IntRangeEvaluator(rangeIndexReader,
-            (SortedDictionaryBasedRangePredicateEvaluator) predicateEvaluator);
+      if (predicateEvaluator.isDictionaryBased()) {

Review Comment:
   Suggest branching on the predicate type first for readability. Basically add an if check on `predicateEvaluator.getPredicateType()` first, then do the switch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065675298


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(

Review Comment:
   Against my better judgement, I have removed the majority of the checks here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1063353961


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator.isDictionaryBased()
+                || SUPPORTED_RAW_DATA_TYPES.contains(predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact()
+                    && predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator;
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+          default:
+        }
+      }
+      return false;
+    }
+
     static RangeEvaluator of(RangeIndexReader<ImmutableRoaringBitmap> rangeIndexReader,
         PredicateEvaluator predicateEvaluator) {
-      if (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator) {
-        return new IntRangeEvaluator(rangeIndexReader,
-            (SortedDictionaryBasedRangePredicateEvaluator) predicateEvaluator);
+      if (predicateEvaluator.isDictionaryBased()) {
+        if (predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) {
+          return new IntPointEvaluator(rangeIndexReader,
+                  (EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) predicateEvaluator);

Review Comment:
   On second thoughts, I don't think we're far enough apart to go back and forth about this, so I've just made the suggested change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065674880


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(
+          predicateEvaluator.getDataType()));
+      switch (predicateEvaluator.getPredicateType()) {
+        case EQ:
+          return datatypeSupported && dataSource.getRangeIndex().isExact() && dataSource.getRangeIndex().isExact() && (
+              predicateEvaluator instanceof DictionaryBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof IntRawValueBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof LongRawValueBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof FloatRawValueBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof DoubleRawValueBasedEqPredicateEvaluator);
+        case RANGE:
+          return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+        default:

Review Comment:
   this code has been removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065591440


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(

Review Comment:
   Self documenting code tends to avoid implicit dependencies like that. This way, you have a _declaration of requirements_ in one place which reduces cognitive overhead for the reader. The overhead of these kinds of checks is also rather small, especially if the method gets inlined and some of the redundancy at the call site gets removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065588654


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/EqualsPredicateEvaluatorFactory.java:
##########
@@ -82,202 +82,6 @@ public static BaseRawValueBasedPredicateEvaluator newRawValueBasedEvaluator(EqPr
     }
   }
 
-  private static final class DictionaryBasedEqPredicateEvaluator extends BaseDictionaryBasedPredicateEvaluator {

Review Comment:
   I mentioned why I made this change here https://github.com/apache/pinot/pull/10043#discussion_r1063363939
   I feel the single interface approach made this less cumbersome; I wouldn't need to care about how long the class names are for formatting reasons if the level of coupling were reduced.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #10043:
URL: https://github.com/apache/pinot/pull/10043#issuecomment-1371979974

   The compatibility verifier is failing because the numDocsScannedInFilter increases, but timeUsedMs decreases by a large factor in each case: e.g. `numEntriesScannedInFilter` increased from 12 to 69, but `timeUsedMs` decreased from 20ms to 3ms.
   
   ```
   2023/01/05 04:11:35.051 ERROR [QueryOp] [main] Comparison FAILED: Line: 23, query: 'SELECT longDimSV1, doubleDimSV1 from FeatureTest3 WHERE doubleDimSV1 > 99 AND generationNumber = 1 LIMIT 1000', actual response: {"resultTable":{"dataSchema":{"columnNames":["longDimSV1","doubleDimSV1"],"columnDataTypes":["LONG","DOUBLE"]},"rows":[[270,99.62],[268,99.08],[183,99.11],[286,99.1]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":60,"numSegmentsProcessed":4,"numSegmentsMatched":3,"numConsumingSegmentsQueried":3,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"numDocsScanned":4,"numEntriesScannedInFilter":69,"numEntriesScannedPostFilter":8,"numGroupsLimitReached":false,"totalDocs":1200,"timeUsedMs":3,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs"
 :0,"realtimeTotalCpuTimeNs":0,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":1672891889033,"explainPlanNumEmptyFilterSegments":0,"numSegmentsPrunedByBroker":0,"numRowsResultSet":4,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":54,"explainPlanNumMatchAllFilterSegments":0,"numSegmentsPrunedByServer":56,"numSegmentsPrunedInvalid":0}, expected response: {"resultTable":{"dataSchema":{"columnNames":["longDimSV1","doubleDimSV1"],"columnDataTypes":["LONG","DOUBLE"]},"rows":[[286,99.1],[183,99.11],[270,99.62],[268,99.08]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":17,"numSegmentsProcessed":3,"numSegmentsMatched":3,"numConsumingSegmentsQueried":3,"numDocsScanned":4,"numEntriesScannedInFilter":12,"numEntriesScannedPostFilter":8,"numGroupsLimitReached":false,"totalDocs":300,"timeUsedMs":20,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlin
 eResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":1641415527398,"numRowsResultSet":4}
   ```
   
   I think this is because filter bitmaps are pushed down to `SVScanDocIdIterator.applyAnd`, so only the already filtered doc ids are counted towards the "cost" of the scan, but they are not pushed down to the range filter, so even though the range filter is faster than scanning, it has to consider more rows before filtering them out, and reports a higher `numEntriesScannedInFilter`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065638996


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/EqualsPredicateEvaluatorFactory.java:
##########
@@ -82,202 +82,6 @@ public static BaseRawValueBasedPredicateEvaluator newRawValueBasedEvaluator(EqPr
     }
   }
 
-  private static final class DictionaryBasedEqPredicateEvaluator extends BaseDictionaryBasedPredicateEvaluator {

Review Comment:
   I have instead introduced the `traits` package to allow extraction of values/ranges from predicate evaluators without needing to know the concrete type. This reduces coupling and improves local readability within the operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #10043: evaluate equals queries against range index when available

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #10043:
URL: https://github.com/apache/pinot/pull/10043


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1063269463


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {

Review Comment:
   I don't think it can be simplified much, and you can only omit the instanceof checks if you're ok with unchecked casts later. At some point this was being used for `RANGE` queries too (and performing the instanceof checks for safety).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1063272372


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator.isDictionaryBased()
+                || SUPPORTED_RAW_DATA_TYPES.contains(predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact()
+                    && predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator;
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+          default:
+        }
+      }
+      return false;
+    }
+
     static RangeEvaluator of(RangeIndexReader<ImmutableRoaringBitmap> rangeIndexReader,
         PredicateEvaluator predicateEvaluator) {
-      if (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator) {
-        return new IntRangeEvaluator(rangeIndexReader,
-            (SortedDictionaryBasedRangePredicateEvaluator) predicateEvaluator);
+      if (predicateEvaluator.isDictionaryBased()) {

Review Comment:
   Yes, I can do that. It's going to be a longer method and harder to fit on screen but I can see benefits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1063363939


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator.isDictionaryBased()
+                || SUPPORTED_RAW_DATA_TYPES.contains(predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact()
+                    && predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator;
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+          default:
+        }
+      }
+      return false;
+    }
+
     static RangeEvaluator of(RangeIndexReader<ImmutableRoaringBitmap> rangeIndexReader,
         PredicateEvaluator predicateEvaluator) {
-      if (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator) {
-        return new IntRangeEvaluator(rangeIndexReader,
-            (SortedDictionaryBasedRangePredicateEvaluator) predicateEvaluator);
+      if (predicateEvaluator.isDictionaryBased()) {
+        if (predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) {
+          return new IntPointEvaluator(rangeIndexReader,
+                  (EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) predicateEvaluator);

Review Comment:
   However, I have had to move the inner classes up to top level to make the class names shorter, otherwise we run into a particular case where the automatic formatting produces lines longer than 120 characters and gets rejected by checkstyle 🤦🏻 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on PR #10043:
URL: https://github.com/apache/pinot/pull/10043#issuecomment-1370289638

   For clarification, is the intention to use range index when inverted index is not available (e.g. for high cardinality double column)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065689645


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RangePredicateEvaluatorFactory.java:
##########
@@ -214,6 +219,16 @@ public int[] getMatchingDictIds() {
     public int getNumMatchingItems() {
       return Math.max(_numMatchingDictIds, 0);
     }
+
+    @Override
+    public int getInclusiveLowerBound() {
+      return getStartDictId();
+    }
+
+    @Override
+    public int getInclusiveUpperBound() {
+      return getEndDictId() - 1;

Review Comment:
   Pay attention to how details like the last dictionary id being exclusive can be encapsulated with simple abstractions, rather than copying this snippet everywhere a range query needs to be evaluated against a dictionary. It's outside of the scope of this PR, but this pattern can be adopted in `SortedIndexBasedFilterOperator` to reduce coupling there, and would simplify the code changes required to introduce a sorted index without a dictionary in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1063374698


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -124,38 +137,102 @@ public List<Operator> getChildOperators() {
 
   @Override
   public String toExplainString() {
-    return EXPLAIN_NAME + "(indexLookUp:range_index"
-        + ",operator:" + _rangePredicateEvaluator.getPredicateType()
-        + ",predicate:" + _rangePredicateEvaluator.getPredicate().toString()
-        + ')';
+    return EXPLAIN_NAME + "(indexLookUp:range_index" + ",operator:" + _rangePredicateEvaluator.getPredicateType()
+        + ",predicate:" + _rangePredicateEvaluator.getPredicate().toString() + ')';
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+        EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT,
+            FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(
+            predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact() && dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator instanceof DictionaryBasedEqPredicateEvaluator
+                || predicateEvaluator instanceof IntRawValueBasedEqPredicateEvaluator
+                || predicateEvaluator instanceof LongRawValueBasedEqPredicateEvaluator
+                || predicateEvaluator instanceof FloatRawValueBasedEqPredicateEvaluator
+                || predicateEvaluator instanceof DoubleRawValueBasedEqPredicateEvaluator);
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);

Review Comment:
   To me, this level of knowledge of `PredicateEvaluator` implementation details is kind of unacceptable (I actually had to refactor some of the classes to make the class names short enough for ergonomic use within this class) and I actually just need to know "can this thing give me an int for equality queries and a pair of ints for range queries?" 
   
   I have a proposal in mind to make this mapping more loosely defined, so that we don't get this level of coupling, and potentially the mapping could even be soft coded, which would allow some users to just opt out of the change (use the range index to solve equality queries) or even to reprioritise it, so I'd like to accept the state of things for now and come back to the problem in general later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065196146


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(
+          predicateEvaluator.getDataType()));
+      switch (predicateEvaluator.getPredicateType()) {
+        case EQ:
+          return datatypeSupported && dataSource.getRangeIndex().isExact() && dataSource.getRangeIndex().isExact() && (
+              predicateEvaluator instanceof DictionaryBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof IntRawValueBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof LongRawValueBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof FloatRawValueBasedEqPredicateEvaluator
+                  || predicateEvaluator instanceof DoubleRawValueBasedEqPredicateEvaluator);
+        case RANGE:
+          return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+              || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+        default:

Review Comment:
   (minor) add a `return false` for readability



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(
+          predicateEvaluator.getDataType()));
+      switch (predicateEvaluator.getPredicateType()) {
+        case EQ:
+          return datatypeSupported && dataSource.getRangeIndex().isExact() && dataSource.getRangeIndex().isExact() && (

Review Comment:
   `dataSource.getRangeIndex().isExact()` is repeated



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(

Review Comment:
   After a second thought, I feel we can remove most of the checks here. When range index exists, it implies that the column is either dictionary encoded, or has supported data types. All we need to check is for EQ the range index should be exact.
   Although this introduce some assumptions to this module, it reduces the per-segment check overhead at query time.



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/EqualsPredicateEvaluatorFactory.java:
##########
@@ -82,202 +82,6 @@ public static BaseRawValueBasedPredicateEvaluator newRawValueBasedEvaluator(EqPr
     }
   }
 
-  private static final class DictionaryBasedEqPredicateEvaluator extends BaseDictionaryBasedPredicateEvaluator {

Review Comment:
   Suggest keeping them as inner class and changing them to `public` access (same as range predicate). All of them are under the scope of EQ predicate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1063271581


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -131,27 +139,75 @@ public String toExplainString() {
   }
 
   interface RangeEvaluator {
+
+    Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES = EnumSet.of(FieldSpec.DataType.INT,
+            FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+    static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+      if (dataSource.getRangeIndex() != null) {
+        boolean datatypeSupported = dataSource.getRangeIndex().isExact()
+                && (predicateEvaluator.isDictionaryBased()
+                || SUPPORTED_RAW_DATA_TYPES.contains(predicateEvaluator.getDataType()));
+        switch (predicateEvaluator.getPredicateType()) {
+          case EQ:
+            return datatypeSupported && dataSource.getRangeIndex().isExact()
+                    && predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator;
+          case RANGE:
+            return datatypeSupported && (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof IntRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof LongRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof FloatRawValueBasedRangePredicateEvaluator
+                    || predicateEvaluator instanceof DoubleRawValueBasedRangePredicateEvaluator);
+          default:
+        }
+      }
+      return false;
+    }
+
     static RangeEvaluator of(RangeIndexReader<ImmutableRoaringBitmap> rangeIndexReader,
         PredicateEvaluator predicateEvaluator) {
-      if (predicateEvaluator instanceof SortedDictionaryBasedRangePredicateEvaluator) {
-        return new IntRangeEvaluator(rangeIndexReader,
-            (SortedDictionaryBasedRangePredicateEvaluator) predicateEvaluator);
+      if (predicateEvaluator.isDictionaryBased()) {
+        if (predicateEvaluator instanceof EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) {
+          return new IntPointEvaluator(rangeIndexReader,
+                  (EqualsPredicateEvaluatorFactory.EqualsPredicateEvaluator) predicateEvaluator);

Review Comment:
   I added the interface so that the operator knows less about the predicate evaluators to reduce coupling - I think this approach is better and could be retrofitted onto the range predicate evaluators. If it's a dealbreaker, I'll change it, but I think this is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10043: evaluate equals queries against range index when available

Posted by GitBox <gi...@apache.org>.
gortiz commented on code in PR #10043:
URL: https://github.com/apache/pinot/pull/10043#discussion_r1065799086


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java:
##########
@@ -44,44 +53,74 @@ public class RangeIndexBasedFilterOperator extends BaseFilterOperator {
 
   private static final String EXPLAIN_NAME = "FILTER_RANGE_INDEX";
 
-  private final RangeEvaluator _rangeEvaluator;
-  private final PredicateEvaluator _rangePredicateEvaluator;
+  static final Set<FieldSpec.DataType> SUPPORTED_RAW_DATA_TYPES =
+      EnumSet.of(FieldSpec.DataType.INT, FieldSpec.DataType.LONG, FieldSpec.DataType.FLOAT, FieldSpec.DataType.DOUBLE);
+
+  private final RangeIndexReader<ImmutableRoaringBitmap> _rangeIndexReader;
+  private final PredicateEvaluator _predicateEvaluator;
   private final DataSource _dataSource;
+  private final FieldSpec.DataType _parameterType;
+  private final Object _parameters;
   private final int _numDocs;
 
+  static boolean canEvaluate(PredicateEvaluator predicateEvaluator, DataSource dataSource) {
+    if (dataSource.getRangeIndex() != null) {
+      boolean datatypeSupported = (predicateEvaluator.isDictionaryBased() || SUPPORTED_RAW_DATA_TYPES.contains(

Review Comment:
   I would vote against removing these checks. They shouldn't be that heavy and as appointed by @richardstartin, they made the code easier to understand and even more resilient without an expected impact on performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #10043:
URL: https://github.com/apache/pinot/pull/10043#issuecomment-1370314922

   > For clarification, is the intention to use range index when inverted index is not available (e.g. for high cardinality double column)
   
   The idea is just to fall back to a range index instead of scanning if there's a range index available, but no other index capable of satisfying an equals query, because it will beat scanning. What users do with this is up to them, but it opens up the possibility of having:
   1. a smaller but slower inverted index for numeric columns without a dictionary, or dictionarized string columns, either as a space optimisation or as an enabler where cardinality would just be too high for an inverted index
   2. potentially make the timestamp index more powerful (e.g. can be used satisfy `hour(timestamp) == 12') with the same underlying index


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10043: evaluate EQ queries against range index when available

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #10043:
URL: https://github.com/apache/pinot/pull/10043#issuecomment-1371598926

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10043](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f72f22e) into [master](https://codecov.io/gh/apache/pinot/commit/c50cef11a79065acb6384336f9b7fb47524cf953?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c50cef1) will **increase** coverage by `0.04%`.
   > The diff coverage is `72.17%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #10043      +/-   ##
   ============================================
   + Coverage     64.23%   64.27%   +0.04%     
   - Complexity     5669     5689      +20     
   ============================================
     Files          1940     1940              
     Lines        105138   105249     +111     
     Branches      16063    16083      +20     
   ============================================
   + Hits          67537    67651     +114     
   + Misses        32717    32707      -10     
   - Partials       4884     4891       +7     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests1 | `67.95% <72.17%> (+0.05%)` | :arrow_up: |
   | unittests2 | `13.57% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...not/segment/spi/index/reader/RangeIndexReader.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9SYW5nZUluZGV4UmVhZGVyLmphdmE=) | `11.11% <0.00%> (-88.89%)` | :arrow_down: |
   | [...ter/predicate/EqualsPredicateEvaluatorFactory.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9maWx0ZXIvcHJlZGljYXRlL0VxdWFsc1ByZWRpY2F0ZUV2YWx1YXRvckZhY3RvcnkuamF2YQ==) | `75.43% <31.25%> (-3.14%)` | :arrow_down: |
   | [...inot/core/operator/filter/FilterOperatorUtils.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9maWx0ZXIvRmlsdGVyT3BlcmF0b3JVdGlscy5qYXZh) | `84.33% <75.00%> (+0.58%)` | :arrow_up: |
   | [...operator/filter/RangeIndexBasedFilterOperator.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9maWx0ZXIvUmFuZ2VJbmRleEJhc2VkRmlsdGVyT3BlcmF0b3IuamF2YQ==) | `74.64% <79.66%> (+3.71%)` | :arrow_up: |
   | [...gment/index/readers/BitSlicedRangeIndexReader.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvQml0U2xpY2VkUmFuZ2VJbmRleFJlYWRlci5qYXZh) | `92.47% <100.00%> (+3.24%)` | :arrow_up: |
   | [...e/pinot/core/transport/InstanceRequestHandler.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS90cmFuc3BvcnQvSW5zdGFuY2VSZXF1ZXN0SGFuZGxlci5qYXZh) | `59.37% <0.00%> (-2.35%)` | :arrow_down: |
   | [.../java/org/apache/pinot/spi/data/TimeFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9UaW1lRmllbGRTcGVjLmphdmE=) | `88.63% <0.00%> (-2.28%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `93.45% <0.00%> (-1.87%)` | :arrow_down: |
   | [...elix/core/periodictask/ControllerPeriodicTask.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3BlcmlvZGljdGFzay9Db250cm9sbGVyUGVyaW9kaWNUYXNrLmphdmE=) | `64.28% <0.00%> (-1.79%)` | :arrow_down: |
   | [...lix/core/realtime/PinotRealtimeSegmentManager.java](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3JlYWx0aW1lL1Bpbm90UmVhbHRpbWVTZWdtZW50TWFuYWdlci5qYXZh) | `78.00% <0.00%> (-1.00%)` | :arrow_down: |
   | ... and [13 more](https://codecov.io/gh/apache/pinot/pull/10043?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org