You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "beyond1920 (via GitHub)" <gi...@apache.org> on 2023/02/27 03:04:22 UTC

[GitHub] [hudi] beyond1920 opened a new pull request, #8051: Improvement of data skip, avoid conversion frequently

beyond1920 opened a new pull request, #8051:
URL: https://github.com/apache/hudi/pull/8051

   ### Change Logs
   1. Refactor ExpressionEvaluators. Split into 2 phase: convert expression to evaluators, tests whether it's possible to match based on the column stats for each evalautors. 
   3. Improve data skipping to avoid conversion from expression to evaluators frequently.
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   NA
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455838202

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * eae2387764ca35c7bcec750ebb9703a948a39da1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583) 
   * 05be3f95f31f533f2e374ac00f72ca50bff68b17 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1448509907

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 1845d37eb0f1d5c660ce83550816d75ca48b768c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419) 
   * 9f10561c4186e7da83c113a8f09901e14870519e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1445647627

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 1845d37eb0f1d5c660ce83550816d75ca48b768c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127450110


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java:
##########
@@ -213,24 +213,17 @@ public void setFilters(List<ResolvedExpression> filters) {
    */
   @Nullable
   private Set<String> candidateFilesInMetadataTable(FileStatus[] allFileStatus) {
-    // NOTE: Data Skipping is only effective when it references columns that are indexed w/in
-    //       the Column Stats Index (CSI). Following cases could not be effectively handled by Data Skipping:
-    //          - Expressions on top-level column's fields (ie, for ex filters like "struct.field > 0", since
-    //          CSI only contains stats for top-level columns, in this case for "struct")
-    //          - Any expression not directly referencing top-level column (for ex, sub-queries, since there's
-    //          nothing CSI in particular could be applied for)
-    if (!metadataConfig.enabled() || !dataSkippingEnabled) {
-      validateConfig();
-      return null;
+    // initialize data skipper if it's not initialized yet
+    if (this.dataPrunerOpt == null) {
+      DataPruner dataPruner = initializeDataPruner();

Review Comment:
   Of course, but it needs refactor on `HoodieTableSource`. 
   Currently, the `FlieIndex` instance is created in `HoodieTableSource` constructor, but `filters` could only be got when do filter push down, the `fileIndex` is already used before filter push down. 
   Just make sure if it's worth to do those refactor to init  the pruner in `FileIndex` constructor before I begin to do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1458213432

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 2d911f1e7fbf288b5abf1b1bc985232f34eb36a3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1129059876


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/EvaluatorUtils.java:
##########
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.source.evaluator.AlwaysFalse;
+import org.apache.hudi.source.evaluator.And;
+import org.apache.hudi.source.evaluator.EqualTo;
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.evaluator.GreaterThan;
+import org.apache.hudi.source.evaluator.GreaterThanOrEqual;
+import org.apache.hudi.source.evaluator.In;
+import org.apache.hudi.source.evaluator.IsNotNull;
+import org.apache.hudi.source.evaluator.IsNull;
+import org.apache.hudi.source.evaluator.LessThan;
+import org.apache.hudi.source.evaluator.LessThanOrEqual;
+import org.apache.hudi.source.evaluator.Not;
+import org.apache.hudi.source.evaluator.NotEqualTo;
+import org.apache.hudi.source.evaluator.NullFalseEvaluator;
+import org.apache.hudi.source.evaluator.Or;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import javax.validation.constraints.NotNull;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Utilities for evaluators.
+ */
+public class EvaluatorUtils {
+
+  /**

Review Comment:
   Understood, my point is the Evaluator class is not a user API, it is a developer API and I want to hide the complexity of the impls from different evaluators, we only expose the tool class `ExpressionEvaludators` to the invoker, that is the way I can think of the code can be cleaner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1453158013

   @danny0405 Please see `FileIndex#candidateFilesInMetadataTable`. 
   There is no need to convert filters to evaluators every time this method is executed.
   BTW, the main goal of this pr is actually to refactor `ExpressionEvaluators`,  split into 2 phase: convert expression to evaluators, evaluate whether it's possible to match based on the column stats.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1457895656

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 68f16fb8f6dd8bb8209581ab250baf34b253c302 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592) 
   * 2d911f1e7fbf288b5abf1b1bc985232f34eb36a3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1129325454


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/EvaluatorUtils.java:
##########
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.source.evaluator.AlwaysFalse;
+import org.apache.hudi.source.evaluator.And;
+import org.apache.hudi.source.evaluator.EqualTo;
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.evaluator.GreaterThan;
+import org.apache.hudi.source.evaluator.GreaterThanOrEqual;
+import org.apache.hudi.source.evaluator.In;
+import org.apache.hudi.source.evaluator.IsNotNull;
+import org.apache.hudi.source.evaluator.IsNull;
+import org.apache.hudi.source.evaluator.LessThan;
+import org.apache.hudi.source.evaluator.LessThanOrEqual;
+import org.apache.hudi.source.evaluator.Not;
+import org.apache.hudi.source.evaluator.NotEqualTo;
+import org.apache.hudi.source.evaluator.NullFalseEvaluator;
+import org.apache.hudi.source.evaluator.Or;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import javax.validation.constraints.NotNull;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Utilities for evaluators.
+ */
+public class EvaluatorUtils {
+
+  /**

Review Comment:
   Thanks for your explain.
   I reserve my opinion on this point. It is also important to maintain maintainability even for the internal implement.
   But if you insist on your point, I'm OK to update the pr on your opinion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455545160

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * eae2387764ca35c7bcec750ebb9703a948a39da1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1459509810

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614",
       "triggerID" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 2d911f1e7fbf288b5abf1b1bc985232f34eb36a3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607) 
   * 2b559390d0b7bed0926f4f536106ac7e3741003f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1129057645


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/DataPruner.java:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.stats.ColumnStats;
+import org.apache.hudi.util.ExpressionUtils;
+
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.TimestampType;
+
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hudi.util.EvaluatorUtils.fromExpression;
+
+/**
+ * Utility to do data skipping.
+ */
+public class DataPruner {
+  private final String[] referencedCols;
+  private final List<Evaluator> evaluators;
+
+  private DataPruner(String[] referencedCols, List<Evaluator> evaluators) {
+    this.referencedCols = referencedCols;
+    this.evaluators = evaluators;
+  }
+
+  /**
+   * Filters the index row with specific data filters and query fields.
+   *
+   * @param indexRow    The index row
+   * @param queryFields The query fields referenced by the filters
+   * @return true if the index row should be considered as a candidate
+   */
+  public boolean test(RowData indexRow, RowType.RowField[] queryFields) {
+    Map<String, ColumnStats> columnStatsMap = convertColumnStats(indexRow, queryFields);
+    for (Evaluator evaluator : evaluators) {
+      if (!evaluator.eval(columnStatsMap)) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  public String[] getReferencedCols() {
+    return referencedCols;
+  }
+
+  public static DataPruner newInstance(List<ResolvedExpression> filters) {
+    if (filters == null || filters.size() == 0) {
+      return null;
+    }
+    String[] referencedCols = ExpressionUtils.referencedColumns(filters);
+    if (referencedCols.length == 0) {
+      return null;
+    }
+    List<Evaluator> evaluators = fromExpression(filters);
+    return new DataPruner(referencedCols, evaluators);
+  }
+
+  public static Map<String, ColumnStats> convertColumnStats(RowData indexRow, RowType.RowField[] queryFields) {
+    Map<String, ColumnStats> mapping = new LinkedHashMap<>();
+    if (indexRow == null || queryFields == null) {
+      return mapping;

Review Comment:
   `AssertError` is not a good coding practice, let's not use that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1449084470

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 9f10561c4186e7da83c113a8f09901e14870519e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "XuQianJin-Stars (via GitHub)" <gi...@apache.org>.
XuQianJin-Stars commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1447702509

   Thanks @beyond1920  nice pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1453551330

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * cb8ea974a3b06a5d580ac9ac095688be3e0c4595 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455410052

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * cb8ea974a3b06a5d580ac9ac095688be3e0c4595 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549) 
   * eae2387764ca35c7bcec750ebb9703a948a39da1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455911514

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * eae2387764ca35c7bcec750ebb9703a948a39da1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583) 
   * 05be3f95f31f533f2e374ac00f72ca50bff68b17 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589) 
   * 68f16fb8f6dd8bb8209581ab250baf34b253c302 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1445643716

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 1845d37eb0f1d5c660ce83550816d75ca48b768c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1445637565

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127308939


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/DataPruner.java:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.stats.ColumnStats;
+import org.apache.hudi.util.ExpressionUtils;
+
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.TimestampType;
+
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hudi.util.EvaluatorUtils.fromExpression;
+
+/**
+ * Utility to do data skipping.
+ */
+public class DataPruner {
+  private final String[] referencedCols;
+  private final List<Evaluator> evaluators;
+
+  private DataPruner(String[] referencedCols, List<Evaluator> evaluators) {
+    this.referencedCols = referencedCols;
+    this.evaluators = evaluators;
+  }
+
+  /**
+   * Filters the index row with specific data filters and query fields.
+   *
+   * @param indexRow    The index row
+   * @param queryFields The query fields referenced by the filters
+   * @return true if the index row should be considered as a candidate
+   */
+  public boolean test(RowData indexRow, RowType.RowField[] queryFields) {
+    Map<String, ColumnStats> columnStatsMap = convertColumnStats(indexRow, queryFields);
+    for (Evaluator evaluator : evaluators) {
+      if (!evaluator.eval(columnStatsMap)) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  public String[] getReferencedCols() {
+    return referencedCols;
+  }
+
+  public static DataPruner newInstance(List<ResolvedExpression> filters) {
+    if (filters == null || filters.size() == 0) {
+      return null;
+    }
+    String[] referencedCols = ExpressionUtils.referencedColumns(filters);
+    if (referencedCols.length == 0) {
+      return null;
+    }
+    List<Evaluator> evaluators = fromExpression(filters);
+    return new DataPruner(referencedCols, evaluators);
+  }
+
+  public static Map<String, ColumnStats> convertColumnStats(RowData indexRow, RowType.RowField[] queryFields) {
+    Map<String, ColumnStats> mapping = new LinkedHashMap<>();
+    if (indexRow == null || queryFields == null) {
+      return mapping;

Review Comment:
   In which case the `indexRow` and `queryFields` can be null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1457975075

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 68f16fb8f6dd8bb8209581ab250baf34b253c302 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592) 
   * 2d911f1e7fbf288b5abf1b1bc985232f34eb36a3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1459482705

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 2d911f1e7fbf288b5abf1b1bc985232f34eb36a3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607) 
   * 2b559390d0b7bed0926f4f536106ac7e3741003f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1453332833

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 9f10561c4186e7da83c113a8f09901e14870519e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485) 
   * dbebc39f00895bc9d50ca73b28780469e26a8378 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127309462


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java:
##########
@@ -213,24 +213,17 @@ public void setFilters(List<ResolvedExpression> filters) {
    */
   @Nullable
   private Set<String> candidateFilesInMetadataTable(FileStatus[] allFileStatus) {
-    // NOTE: Data Skipping is only effective when it references columns that are indexed w/in
-    //       the Column Stats Index (CSI). Following cases could not be effectively handled by Data Skipping:
-    //          - Expressions on top-level column's fields (ie, for ex filters like "struct.field > 0", since
-    //          CSI only contains stats for top-level columns, in this case for "struct")
-    //          - Any expression not directly referencing top-level column (for ex, sub-queries, since there's
-    //          nothing CSI in particular could be applied for)
-    if (!metadataConfig.enabled() || !dataSkippingEnabled) {
-      validateConfig();
-      return null;
+    // initialize data skipper if it's not initialized yet
+    if (this.dataPrunerOpt == null) {
+      DataPruner dataPruner = initializeDataPruner();

Review Comment:
   Can we init the pruner in FileIndex constructor, make sure the `DataPruner` is serializabel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455370234

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * cb8ea974a3b06a5d580ac9ac095688be3e0c4595 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549) 
   * eae2387764ca35c7bcec750ebb9703a948a39da1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455924362

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 05be3f95f31f533f2e374ac00f72ca50bff68b17 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589) 
   * 68f16fb8f6dd8bb8209581ab250baf34b253c302 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1461820992

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614",
       "triggerID" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd1480e99f1a380e052f848370b84b1d4f4018d4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd1480e99f1a380e052f848370b84b1d4f4018d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 2b559390d0b7bed0926f4f536106ac7e3741003f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614) 
   * cd1480e99f1a380e052f848370b84b1d4f4018d4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1461849915

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614",
       "triggerID" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd1480e99f1a380e052f848370b84b1d4f4018d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15634",
       "triggerID" : "cd1480e99f1a380e052f848370b84b1d4f4018d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 2b559390d0b7bed0926f4f536106ac7e3741003f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614) 
   * cd1480e99f1a380e052f848370b84b1d4f4018d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15634) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1456288463

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 68f16fb8f6dd8bb8209581ab250baf34b253c302 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1453474222

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * dbebc39f00895bc9d50ca73b28780469e26a8378 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1448417571

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 1845d37eb0f1d5c660ce83550816d75ca48b768c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419) 
   * 9f10561c4186e7da83c113a8f09901e14870519e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1445829819

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 1845d37eb0f1d5c660ce83550816d75ca48b768c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1453504147

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * dbebc39f00895bc9d50ca73b28780469e26a8378 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551) 
   * cb8ea974a3b06a5d580ac9ac095688be3e0c4595 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1453346248

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 9f10561c4186e7da83c113a8f09901e14870519e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485) 
   * dbebc39f00895bc9d50ca73b28780469e26a8378 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1455825548

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * eae2387764ca35c7bcec750ebb9703a948a39da1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583) 
   * 05be3f95f31f533f2e374ac00f72ca50bff68b17 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #8051:
URL: https://github.com/apache/hudi/pull/8051


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1462124426

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614",
       "triggerID" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd1480e99f1a380e052f848370b84b1d4f4018d4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15634",
       "triggerID" : "cd1480e99f1a380e052f848370b84b1d4f4018d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * cd1480e99f1a380e052f848370b84b1d4f4018d4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15634) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1451302850

   > conversion from expression to evaluators frequently
   
   Thanks, which part do you mean for the frequency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1124263634


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/stats/TestExpressionEvaluator.java:
##########
@@ -77,291 +88,291 @@ public class TestExpressionEvaluator {
 
   @Test
   void testEqualTo() {
-    ExpressionEvaluator.EqualTo equalTo = ExpressionEvaluator.EqualTo.getInstance();
+    EqualTo equalTo = EqualTo.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
     ValueLiteralExpression vExpr = new ValueLiteralExpression(12);
 
+    equalTo.bindVal(vExpr)
+        .bindFieldReference(rExpr);
     RowData indexRow1 = intIndexRow(11, 13);
-    equalTo.bindFieldReference(rExpr)
-        .bindVal(vExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(equalTo.eval(), "11 < 12 < 13");
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(equalTo.eval(stats1), "11 < 12 < 13");
 
     RowData indexRow2 = intIndexRow(12, 13);
-    equalTo.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertTrue(equalTo.eval(), "12 <= 12 < 13");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertTrue(equalTo.eval(stats2), "12 <= 12 < 13");
 
     RowData indexRow3 = intIndexRow(11, 12);
-    equalTo.bindColStats(indexRow3, queryFields(2), rExpr);
-    assertTrue(equalTo.eval(), "11 < 12 <= 12");
+    Map<String, ColumnStats> stats3 = convertColumnStats(indexRow3, queryFields(2));
+    assertTrue(equalTo.eval(stats3), "11 < 12 <= 12");
 
     RowData indexRow4 = intIndexRow(10, 11);
-    equalTo.bindColStats(indexRow4, queryFields(2), rExpr);
-    assertFalse(equalTo.eval(), "11 < 12");
+    Map<String, ColumnStats> stats4 = convertColumnStats(indexRow4, queryFields(2));
+    assertFalse(equalTo.eval(stats4), "11 < 12");
 
     RowData indexRow5 = intIndexRow(13, 14);
-    equalTo.bindColStats(indexRow5, queryFields(2), rExpr);
-    assertFalse(equalTo.eval(), "12 < 13");
+    Map<String, ColumnStats> stats5 = convertColumnStats(indexRow5, queryFields(2));
+    assertFalse(equalTo.eval(stats5), "12 < 13");
 
     RowData indexRow6 = intIndexRow(null, null);
-    equalTo.bindColStats(indexRow6, queryFields(2), rExpr);
-    assertFalse(equalTo.eval(), "12 <> null");
+    Map<String, ColumnStats> stats6 = convertColumnStats(indexRow6, queryFields(2));
+    assertFalse(equalTo.eval(stats6), "12 <> null");
 
     equalTo.bindVal(new ValueLiteralExpression(null, DataTypes.INT()));
-    assertFalse(equalTo.eval(), "It is not possible to test for NULL values with '=' operator");
+    assertFalse(equalTo.eval(stats1), "It is not possible to test for NULL values with '=' operator");
   }
 
   @Test
   void testNotEqualTo() {
-    ExpressionEvaluator.NotEqualTo notEqualTo = ExpressionEvaluator.NotEqualTo.getInstance();
+    NotEqualTo notEqualTo = NotEqualTo.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
     ValueLiteralExpression vExpr = new ValueLiteralExpression(12);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    notEqualTo.bindFieldReference(rExpr)
-        .bindVal(vExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(notEqualTo.eval(), "11 <> 12 && 12 <> 13");
+    notEqualTo.bindVal(vExpr)
+        .bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(notEqualTo.eval(stats1), "11 <> 12 && 12 <> 13");
 
     RowData indexRow2 = intIndexRow(12, 13);
-    notEqualTo.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertTrue(notEqualTo.eval(), "12 <> 13");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertTrue(notEqualTo.eval(stats2), "12 <> 13");
 
     RowData indexRow3 = intIndexRow(11, 12);
-    notEqualTo.bindColStats(indexRow3, queryFields(2), rExpr);
-    assertTrue(notEqualTo.eval(), "11 <> 12");
+    Map<String, ColumnStats> stats3 = convertColumnStats(indexRow3, queryFields(2));
+    assertTrue(notEqualTo.eval(stats3), "11 <> 12");
 
     RowData indexRow4 = intIndexRow(10, 11);
-    notEqualTo.bindColStats(indexRow4, queryFields(2), rExpr);
-    assertTrue(notEqualTo.eval(), "10 <> 12 and 11 < 12");
+    Map<String, ColumnStats> stats4 = convertColumnStats(indexRow4, queryFields(2));
+    assertTrue(notEqualTo.eval(stats4), "10 <> 12 and 11 < 12");
 
     RowData indexRow5 = intIndexRow(13, 14);
-    notEqualTo.bindColStats(indexRow5, queryFields(2), rExpr);
-    assertTrue(notEqualTo.eval(), "12 <> 13 and 12 <> 14");
+    Map<String, ColumnStats> stats5 = convertColumnStats(indexRow5, queryFields(2));
+    assertTrue(notEqualTo.eval(stats5), "12 <> 13 and 12 <> 14");
 
     RowData indexRow6 = intIndexRow(null, null);
-    notEqualTo.bindColStats(indexRow6, queryFields(2), rExpr);
-    assertTrue(notEqualTo.eval(), "12 <> null");
+    Map<String, ColumnStats> stats6 = convertColumnStats(indexRow6, queryFields(2));
+    assertTrue(notEqualTo.eval(stats6), "12 <> null");
 
     notEqualTo.bindVal(new ValueLiteralExpression(null, DataTypes.INT()));
-    assertFalse(notEqualTo.eval(), "It is not possible to test for NULL values with '<>' operator");
+    assertFalse(notEqualTo.eval(stats1), "It is not possible to test for NULL values with '<>' operator");
   }
 
   @Test
   void testIsNull() {
-    ExpressionEvaluator.IsNull isNull = ExpressionEvaluator.IsNull.getInstance();
+    IsNull isNull = IsNull.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    isNull.bindFieldReference(rExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(isNull.eval(), "2 nulls");
+    isNull.bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(isNull.eval(stats1), "2 nulls");
 
     RowData indexRow2 = intIndexRow(12, 13, 0L);
-    isNull.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertFalse(isNull.eval(), "0 nulls");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertFalse(isNull.eval(stats2), "0 nulls");
   }
 
   @Test
   void testIsNotNull() {
-    ExpressionEvaluator.IsNotNull isNotNull = ExpressionEvaluator.IsNotNull.getInstance();
+    IsNotNull isNotNull = IsNotNull.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    isNotNull.bindFieldReference(rExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(isNotNull.eval(), "min 11 is not null");
+    isNotNull.bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(isNotNull.eval(stats1), "min 11 is not null");
 
     RowData indexRow2 = intIndexRow(null, null, 0L);
-    isNotNull.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertTrue(isNotNull.eval(), "min is null and 0 nulls");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertTrue(isNotNull.eval(stats2), "min is null and 0 nulls");
   }
 
   @Test
   void testLessThan() {
-    ExpressionEvaluator.LessThan lessThan = ExpressionEvaluator.LessThan.getInstance();
+    LessThan lessThan = LessThan.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
     ValueLiteralExpression vExpr = new ValueLiteralExpression(12);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    lessThan.bindFieldReference(rExpr)
-        .bindVal(vExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(lessThan.eval(), "12 < 13");
+    lessThan.bindVal(vExpr)
+        .bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(lessThan.eval(stats1), "12 < 13");
 
     RowData indexRow2 = intIndexRow(12, 13);
-    lessThan.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertFalse(lessThan.eval(), "min 12 = 12");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertFalse(lessThan.eval(stats2), "min 12 = 12");
 
     RowData indexRow3 = intIndexRow(11, 12);
-    lessThan.bindColStats(indexRow3, queryFields(2), rExpr);
-    assertTrue(lessThan.eval(), "11 < 12");
+    Map<String, ColumnStats> stats3 = convertColumnStats(indexRow3, queryFields(2));
+    assertTrue(lessThan.eval(stats3), "11 < 12");
 
     RowData indexRow4 = intIndexRow(10, 11);
-    lessThan.bindColStats(indexRow4, queryFields(2), rExpr);
-    assertTrue(lessThan.eval(), "11 < 12");
+    Map<String, ColumnStats> stats4 = convertColumnStats(indexRow4, queryFields(2));
+    assertTrue(lessThan.eval(stats4), "11 < 12");
 
     RowData indexRow5 = intIndexRow(13, 14);
-    lessThan.bindColStats(indexRow5, queryFields(2), rExpr);
-    assertFalse(lessThan.eval(), "12 < min 13");
+    Map<String, ColumnStats> stats5 = convertColumnStats(indexRow5, queryFields(2));
+    assertFalse(lessThan.eval(stats5), "12 < min 13");
 
     RowData indexRow6 = intIndexRow(null, null);
-    lessThan.bindColStats(indexRow6, queryFields(2), rExpr);
-    assertFalse(lessThan.eval(), "12 <> null");
+    Map<String, ColumnStats> stats6 = convertColumnStats(indexRow6, queryFields(2));
+    assertFalse(lessThan.eval(stats6), "12 <> null");
 
     lessThan.bindVal(new ValueLiteralExpression(null, DataTypes.INT()));
-    assertFalse(lessThan.eval(), "It is not possible to test for NULL values with '<' operator");
+    assertFalse(lessThan.eval(stats1), "It is not possible to test for NULL values with '<' operator");
   }
 
   @Test
   void testGreaterThan() {
-    ExpressionEvaluator.GreaterThan greaterThan = ExpressionEvaluator.GreaterThan.getInstance();
+    GreaterThan greaterThan = GreaterThan.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
     ValueLiteralExpression vExpr = new ValueLiteralExpression(12);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    greaterThan.bindFieldReference(rExpr)
-        .bindVal(vExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(greaterThan.eval(), "12 < 13");
+    greaterThan.bindVal(vExpr)
+        .bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(greaterThan.eval(stats1), "12 < 13");
 
     RowData indexRow2 = intIndexRow(12, 13);
-    greaterThan.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertTrue(greaterThan.eval(), "12 < 13");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertTrue(greaterThan.eval(stats2), "12 < 13");
 
     RowData indexRow3 = intIndexRow(11, 12);
-    greaterThan.bindColStats(indexRow3, queryFields(2), rExpr);
-    assertFalse(greaterThan.eval(), "max 12 = 12");
+    Map<String, ColumnStats> stats3 = convertColumnStats(indexRow3, queryFields(2));
+    assertFalse(greaterThan.eval(stats3), "max 12 = 12");
 
     RowData indexRow4 = intIndexRow(10, 11);
-    greaterThan.bindColStats(indexRow4, queryFields(2), rExpr);
-    assertFalse(greaterThan.eval(), "max 11 < 12");
+    Map<String, ColumnStats> stats4 = convertColumnStats(indexRow4, queryFields(2));
+    assertFalse(greaterThan.eval(stats4), "max 11 < 12");
 
     RowData indexRow5 = intIndexRow(13, 14);
-    greaterThan.bindColStats(indexRow5, queryFields(2), rExpr);
-    assertTrue(greaterThan.eval(), "12 < 13");
+    Map<String, ColumnStats> stats5 = convertColumnStats(indexRow5, queryFields(2));
+    assertTrue(greaterThan.eval(stats5), "12 < 13");
 
     RowData indexRow6 = intIndexRow(null, null);
-    greaterThan.bindColStats(indexRow6, queryFields(2), rExpr);
-    assertFalse(greaterThan.eval(), "12 <> null");
+    Map<String, ColumnStats> stats6 = convertColumnStats(indexRow6, queryFields(2));
+    assertFalse(greaterThan.eval(stats6), "12 <> null");
 
     greaterThan.bindVal(new ValueLiteralExpression(null, DataTypes.INT()));
-    assertFalse(greaterThan.eval(), "It is not possible to test for NULL values with '>' operator");
+    assertFalse(greaterThan.eval(stats1), "It is not possible to test for NULL values with '>' operator");
   }
 
   @Test
   void testLessThanOrEqual() {
-    ExpressionEvaluator.LessThanOrEqual lessThanOrEqual = ExpressionEvaluator.LessThanOrEqual.getInstance();
+    LessThanOrEqual lessThanOrEqual = LessThanOrEqual.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
     ValueLiteralExpression vExpr = new ValueLiteralExpression(12);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    lessThanOrEqual.bindFieldReference(rExpr)
-        .bindVal(vExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(lessThanOrEqual.eval(), "11 < 12");
+    lessThanOrEqual.bindVal(vExpr)
+        .bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(lessThanOrEqual.eval(stats1), "11 < 12");
 
     RowData indexRow2 = intIndexRow(12, 13);
-    lessThanOrEqual.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertTrue(lessThanOrEqual.eval(), "min 12 = 12");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertTrue(lessThanOrEqual.eval(stats2), "min 12 = 12");
 
     RowData indexRow3 = intIndexRow(11, 12);
-    lessThanOrEqual.bindColStats(indexRow3, queryFields(2), rExpr);
-    assertTrue(lessThanOrEqual.eval(), "max 12 = 12");
+    Map<String, ColumnStats> stats3 = convertColumnStats(indexRow3, queryFields(2));
+    assertTrue(lessThanOrEqual.eval(stats3), "max 12 = 12");
 
     RowData indexRow4 = intIndexRow(10, 11);
-    lessThanOrEqual.bindColStats(indexRow4, queryFields(2), rExpr);
-    assertTrue(lessThanOrEqual.eval(), "max 11 < 12");
+    Map<String, ColumnStats> stats4 = convertColumnStats(indexRow4, queryFields(2));
+    assertTrue(lessThanOrEqual.eval(stats4), "max 11 < 12");
 
     RowData indexRow5 = intIndexRow(13, 14);
-    lessThanOrEqual.bindColStats(indexRow5, queryFields(2), rExpr);
-    assertFalse(lessThanOrEqual.eval(), "12 < 13");
+    Map<String, ColumnStats> stats5 = convertColumnStats(indexRow5, queryFields(2));
+    assertFalse(lessThanOrEqual.eval(stats5), "12 < 13");
 
     RowData indexRow6 = intIndexRow(null, null);
-    lessThanOrEqual.bindColStats(indexRow6, queryFields(2), rExpr);
-    assertFalse(lessThanOrEqual.eval(), "12 <> null");
+    Map<String, ColumnStats> stats6 = convertColumnStats(indexRow6, queryFields(2));
+    assertFalse(lessThanOrEqual.eval(stats6), "12 <> null");
 
     lessThanOrEqual.bindVal(new ValueLiteralExpression(null, DataTypes.INT()));
-    assertFalse(lessThanOrEqual.eval(), "It is not possible to test for NULL values with '<=' operator");
+    assertFalse(lessThanOrEqual.eval(stats1), "It is not possible to test for NULL values with '<=' operator");
   }
 
   @Test
   void testGreaterThanOrEqual() {
-    ExpressionEvaluator.GreaterThanOrEqual greaterThanOrEqual = ExpressionEvaluator.GreaterThanOrEqual.getInstance();
+    GreaterThanOrEqual greaterThanOrEqual = GreaterThanOrEqual.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
     ValueLiteralExpression vExpr = new ValueLiteralExpression(12);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    greaterThanOrEqual.bindFieldReference(rExpr)
-        .bindVal(vExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    assertTrue(greaterThanOrEqual.eval(), "12 < 13");
+    greaterThanOrEqual.bindVal(vExpr)
+        .bindFieldReference(rExpr);
+    Map<String, ColumnStats> stats1 = convertColumnStats(indexRow1, queryFields(2));
+    assertTrue(greaterThanOrEqual.eval(stats1), "12 < 13");
 
     RowData indexRow2 = intIndexRow(12, 13);
-    greaterThanOrEqual.bindColStats(indexRow2, queryFields(2), rExpr);
-    assertTrue(greaterThanOrEqual.eval(), "min 12 = 12");
+    Map<String, ColumnStats> stats2 = convertColumnStats(indexRow2, queryFields(2));
+    assertTrue(greaterThanOrEqual.eval(stats2), "min 12 = 12");
 
     RowData indexRow3 = intIndexRow(11, 12);
-    greaterThanOrEqual.bindColStats(indexRow3, queryFields(2), rExpr);
-    assertTrue(greaterThanOrEqual.eval(), "max 12 = 12");
+    Map<String, ColumnStats> stats3 = convertColumnStats(indexRow3, queryFields(2));
+    assertTrue(greaterThanOrEqual.eval(stats3), "max 12 = 12");
 
     RowData indexRow4 = intIndexRow(10, 11);
-    greaterThanOrEqual.bindColStats(indexRow4, queryFields(2), rExpr);
-    assertFalse(greaterThanOrEqual.eval(), "max 11 < 12");
+    Map<String, ColumnStats> stats4 = convertColumnStats(indexRow4, queryFields(2));
+    assertFalse(greaterThanOrEqual.eval(stats4), "max 11 < 12");
 
     RowData indexRow5 = intIndexRow(13, 14);
-    greaterThanOrEqual.bindColStats(indexRow5, queryFields(2), rExpr);
-    assertTrue(greaterThanOrEqual.eval(), "12 < 13");
+    Map<String, ColumnStats> stats5 = convertColumnStats(indexRow5, queryFields(2));
+    assertTrue(greaterThanOrEqual.eval(stats5), "12 < 13");
 
     RowData indexRow6 = intIndexRow(null, null);
-    greaterThanOrEqual.bindColStats(indexRow6, queryFields(2), rExpr);
-    assertFalse(greaterThanOrEqual.eval(), "12 <> null");
+    Map<String, ColumnStats> stats6 = convertColumnStats(indexRow6, queryFields(2));
+    assertFalse(greaterThanOrEqual.eval(stats6), "12 <> null");
 
     greaterThanOrEqual.bindVal(new ValueLiteralExpression(null, DataTypes.INT()));
-    assertFalse(greaterThanOrEqual.eval(), "It is not possible to test for NULL values with '>=' operator");
+    assertFalse(greaterThanOrEqual.eval(stats1), "It is not possible to test for NULL values with '>=' operator");
   }
 
   @Test
   void testIn() {
-    ExpressionEvaluator.In in = ExpressionEvaluator.In.getInstance();
+    In in = In.getInstance();
     FieldReferenceExpression rExpr = new FieldReferenceExpression("f_int", DataTypes.INT(), 2, 2);
 
     RowData indexRow1 = intIndexRow(11, 13);
-    in.bindFieldReference(rExpr)
-        .bindColStats(indexRow1, queryFields(2), rExpr);
-    in.bindVals(12);
-    assertTrue(in.eval(), "11 < 12 < 13");
+    in.bindFieldReference(rExpr);
+    in.bindVals(11, 12);

Review Comment:
   In this pr, I fix a little bug in `In` evaluator.
   Modify test case which IN expression has multiple literals to validate the behavior is correct now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127308123


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/EvaluatorUtils.java:
##########
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.source.evaluator.AlwaysFalse;
+import org.apache.hudi.source.evaluator.And;
+import org.apache.hudi.source.evaluator.EqualTo;
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.evaluator.GreaterThan;
+import org.apache.hudi.source.evaluator.GreaterThanOrEqual;
+import org.apache.hudi.source.evaluator.In;
+import org.apache.hudi.source.evaluator.IsNotNull;
+import org.apache.hudi.source.evaluator.IsNull;
+import org.apache.hudi.source.evaluator.LessThan;
+import org.apache.hudi.source.evaluator.LessThanOrEqual;
+import org.apache.hudi.source.evaluator.Not;
+import org.apache.hudi.source.evaluator.NotEqualTo;
+import org.apache.hudi.source.evaluator.NullFalseEvaluator;
+import org.apache.hudi.source.evaluator.Or;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import javax.validation.constraints.NotNull;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Utilities for evaluators.
+ */
+public class EvaluatorUtils {
+
+  /**

Review Comment:
   Can we rename the clazz to `ExpressionEvaludators` instead and move all the evaludator classes into it as inner classes,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127450583


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/DataPruner.java:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.stats.ColumnStats;
+import org.apache.hudi.util.ExpressionUtils;
+
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.TimestampType;
+
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hudi.util.EvaluatorUtils.fromExpression;
+
+/**
+ * Utility to do data skipping.
+ */
+public class DataPruner {
+  private final String[] referencedCols;
+  private final List<Evaluator> evaluators;
+
+  private DataPruner(String[] referencedCols, List<Evaluator> evaluators) {
+    this.referencedCols = referencedCols;
+    this.evaluators = evaluators;
+  }
+
+  /**
+   * Filters the index row with specific data filters and query fields.
+   *
+   * @param indexRow    The index row
+   * @param queryFields The query fields referenced by the filters
+   * @return true if the index row should be considered as a candidate
+   */
+  public boolean test(RowData indexRow, RowType.RowField[] queryFields) {
+    Map<String, ColumnStats> columnStatsMap = convertColumnStats(indexRow, queryFields);
+    for (Evaluator evaluator : evaluators) {
+      if (!evaluator.eval(columnStatsMap)) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  public String[] getReferencedCols() {
+    return referencedCols;
+  }
+
+  public static DataPruner newInstance(List<ResolvedExpression> filters) {
+    if (filters == null || filters.size() == 0) {
+      return null;
+    }
+    String[] referencedCols = ExpressionUtils.referencedColumns(filters);
+    if (referencedCols.length == 0) {
+      return null;
+    }
+    List<Evaluator> evaluators = fromExpression(filters);
+    return new DataPruner(referencedCols, evaluators);
+  }
+
+  public static Map<String, ColumnStats> convertColumnStats(RowData indexRow, RowType.RowField[] queryFields) {
+    Map<String, ColumnStats> mapping = new LinkedHashMap<>();
+    if (indexRow == null || queryFields == null) {
+      return mapping;

Review Comment:
   there is no possible to hit this branch. I could also throw new `AssertError` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127455864


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/EvaluatorUtils.java:
##########
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.util;
+
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.source.evaluator.AlwaysFalse;
+import org.apache.hudi.source.evaluator.And;
+import org.apache.hudi.source.evaluator.EqualTo;
+import org.apache.hudi.source.evaluator.Evaluator;
+import org.apache.hudi.source.evaluator.GreaterThan;
+import org.apache.hudi.source.evaluator.GreaterThanOrEqual;
+import org.apache.hudi.source.evaluator.In;
+import org.apache.hudi.source.evaluator.IsNotNull;
+import org.apache.hudi.source.evaluator.IsNull;
+import org.apache.hudi.source.evaluator.LessThan;
+import org.apache.hudi.source.evaluator.LessThanOrEqual;
+import org.apache.hudi.source.evaluator.Not;
+import org.apache.hudi.source.evaluator.NotEqualTo;
+import org.apache.hudi.source.evaluator.NullFalseEvaluator;
+import org.apache.hudi.source.evaluator.Or;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import javax.validation.constraints.NotNull;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Utilities for evaluators.
+ */
+public class EvaluatorUtils {
+
+  /**

Review Comment:
   I think it's better not move all the evaluator classes into it as inner classes. Actually, I deliberately separate the original inner evaluator classes as the top-level class in this pr because:
   1. I would extends each evaluators to support evaluate based on column values. It's not easy to maintain when each evaluator become more longer.
   2. After separate utility class and evaluator classes, the code is more clean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8051:
URL: https://github.com/apache/hudi/pull/8051#discussion_r1127450110


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java:
##########
@@ -213,24 +213,17 @@ public void setFilters(List<ResolvedExpression> filters) {
    */
   @Nullable
   private Set<String> candidateFilesInMetadataTable(FileStatus[] allFileStatus) {
-    // NOTE: Data Skipping is only effective when it references columns that are indexed w/in
-    //       the Column Stats Index (CSI). Following cases could not be effectively handled by Data Skipping:
-    //          - Expressions on top-level column's fields (ie, for ex filters like "struct.field > 0", since
-    //          CSI only contains stats for top-level columns, in this case for "struct")
-    //          - Any expression not directly referencing top-level column (for ex, sub-queries, since there's
-    //          nothing CSI in particular could be applied for)
-    if (!metadataConfig.enabled() || !dataSkippingEnabled) {
-      validateConfig();
-      return null;
+    // initialize data skipper if it's not initialized yet
+    if (this.dataPrunerOpt == null) {
+      DataPruner dataPruner = initializeDataPruner();

Review Comment:
   Of course, but it needs refactor on `HoodieTableSource`. 
   Currently, the `FlieIndex` instance is created in `HoodieTableSource` constructor, but `filters` could only be got when do filter push down, the `fileIndex` is already used before filter push down. 
   I would try to do some refactor in next commit of the pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8051: [HUDI-5851] Improvement of data skipping, only converts expressions to evaluators once

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8051:
URL: https://github.com/apache/hudi/pull/8051#issuecomment-1459689281

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "70822885c9cd8df3e8e540d3febffd4a4d1dfe32",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15419",
       "triggerID" : "1845d37eb0f1d5c660ce83550816d75ca48b768c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15485",
       "triggerID" : "9f10561c4186e7da83c113a8f09901e14870519e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15551",
       "triggerID" : "dbebc39f00895bc9d50ca73b28780469e26a8378",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15549",
       "triggerID" : "cb8ea974a3b06a5d580ac9ac095688be3e0c4595",
       "triggerType" : "PUSH"
     }, {
       "hash" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15583",
       "triggerID" : "eae2387764ca35c7bcec750ebb9703a948a39da1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15589",
       "triggerID" : "05be3f95f31f533f2e374ac00f72ca50bff68b17",
       "triggerType" : "PUSH"
     }, {
       "hash" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15592",
       "triggerID" : "68f16fb8f6dd8bb8209581ab250baf34b253c302",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15607",
       "triggerID" : "2d911f1e7fbf288b5abf1b1bc985232f34eb36a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614",
       "triggerID" : "2b559390d0b7bed0926f4f536106ac7e3741003f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 70822885c9cd8df3e8e540d3febffd4a4d1dfe32 UNKNOWN
   * 2b559390d0b7bed0926f4f536106ac7e3741003f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15614) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org