You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "beyond1920 (via GitHub)" <gi...@apache.org> on 2023/03/06 09:53:48 UTC

[GitHub] [hudi] beyond1920 opened a new pull request, #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

beyond1920 opened a new pull request, #8101:
URL: https://github.com/apache/hudi/pull/8101

   ### Change Logs
   The pr aims to extends evaluators to support evaluate based on column values.
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   NA
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1140231739


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Open a new [PR-8218](https://github.com/apache/hudi/pull/8218)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136491610


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Not sure whether I got your idea, for a expression `Evaluator`, we firstly bind the field reference to it, that means you can then fetch the field value from the reference index and the given rows, for partition pruning, things are a little different because in on partition, all the data files may share the same partition path values, but the partition values should be also included in the data row, so you can still do the similar filteiring just like we do to data skipping.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136513290


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Leave them in two different eval method is clean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136512519


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Yes, for partition pruning, statistics of partition columns for a specified partition is an exactly value instead of a range with [min, max] and nullCnt.
   Of course, we could wrap this exact value in a `columnstats` object, which min is same with max and nullCnt is 0.
   But I prefer not to do so because it would cause unnecessary overhead. For example, for `Equals`, `NotEqualsTo`, `In`, there is no need to compare the literal value with minValue and maxValue twice, just compare it with exact value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1135032944


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   `ColumnValues` does not comes from literals. 
   For  p_date = '20220101', 
   `ColumnValues` are all p_date values in table. `ColumnStats` are statistics for p_date column. The literal values are both used in those two cases.
   Pleas see detail in https://github.com/apache/hudi/pull/8102/files#diff-3098f0a6c1cae51c7c4c99166d92e56a80129afb2fc12c03072e3e101587c932
   It's not good to wrap those value as `ColumnStats ` because they are totally different thing. For `ColumnStats`, `evaluates` method returns false if there is no any possible to match, it's just a best effort estimate.
   For `ColumnValues`, `evaluates` method true if it's matches, it's an exact matching rule based on exact values.
   It's clear to keep two API method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136512519


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Yes, for partition pruning, statistics of partition columns for a specified partition is an exactly value instead of a range with [min, max] and nullCnt.
   Of course, we could wrap this exact value in a `columnstats` object, which min is same with max and nullCnt is 0.
   But I prefer not to do so because it would cause unnecessary overhead. For example, for `Equals`, `NotEqualsTo` , `In` to prune partition, there is no need to compare the literal value with minValue and maxValue twice, just compare it with exact value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1135032944


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   `ColumnValues` comes from literal values. 
   It's not good to wrap those value as `ColumnStats ` because they are totally different thing. For `ColumnStats`, `evaluates` method returns false if there is no any possible to match, it's just a best effort estimate.
   For `ColumnValues`, `evaluates` method true if it's matches, it's an exact matching rule based on exact values.
   It's clear to keep two API method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 closed pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 closed pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values
URL: https://github.com/apache/hudi/pull/8101


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136616801


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Sorry, let's not introduce uncecessary interfaces, the performance gains should be little.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8101:
URL: https://github.com/apache/hudi/pull/8101#issuecomment-1455838565

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590",
       "triggerID" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff2a6fa29a3ea2b46532420f218859f6e59f10de Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8101:
URL: https://github.com/apache/hudi/pull/8101#issuecomment-1456073340

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590",
       "triggerID" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff2a6fa29a3ea2b46532420f218859f6e59f10de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1134970324


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   What do you indicate for these `columnValues`? Are they come from the constant literal? Can we just wrap these constant as another `ColumnStats` and reuse the existing interfaces?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8101:
URL: https://github.com/apache/hudi/pull/8101#issuecomment-1464112889

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590",
       "triggerID" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e06622760d679a66c9f9b17028bf69fb2ffab93",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15662",
       "triggerID" : "5e06622760d679a66c9f9b17028bf69fb2ffab93",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5e06622760d679a66c9f9b17028bf69fb2ffab93 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15662) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1135032944


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   `ColumnValues` does not comes from literals. 
   For  p_date = '20220101', 
   `ColumnValues` are all p_date values in table. `ColumnStats` are statistics of p_date column. The literal values are both used in those two cases to compare with `ColumnValues` or `ColumnStats`.
   Pleas see detail in https://github.com/apache/hudi/pull/8102/files#diff-3098f0a6c1cae51c7c4c99166d92e56a80129afb2fc12c03072e3e101587c932
   It's not good to wrap those value as `ColumnStats ` because they are totally different thing. For `ColumnStats`, `evaluates` method returns false if there is no any possible to match, it's just a best effort estimate.
   For `ColumnValues`, `evaluates` method true if it's matches, it's an exact matching rule based on exact values.
   It's clear to keep two API method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8101:
URL: https://github.com/apache/hudi/pull/8101#issuecomment-1455826043

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff2a6fa29a3ea2b46532420f218859f6e59f10de UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8101:
URL: https://github.com/apache/hudi/pull/8101#issuecomment-1463593084

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590",
       "triggerID" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e06622760d679a66c9f9b17028bf69fb2ffab93",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5e06622760d679a66c9f9b17028bf69fb2ffab93",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff2a6fa29a3ea2b46532420f218859f6e59f10de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590) 
   * 5e06622760d679a66c9f9b17028bf69fb2ffab93 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8101:
URL: https://github.com/apache/hudi/pull/8101#issuecomment-1463602325

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590",
       "triggerID" : "ff2a6fa29a3ea2b46532420f218859f6e59f10de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e06622760d679a66c9f9b17028bf69fb2ffab93",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15662",
       "triggerID" : "5e06622760d679a66c9f9b17028bf69fb2ffab93",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff2a6fa29a3ea2b46532420f218859f6e59f10de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15590) 
   * 5e06622760d679a66c9f9b17028bf69fb2ffab93 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15662) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] beyond1920 commented on a diff in pull request #8101: [HUDI-5879] Extends evaluators to support evaluate based on column values

Posted by "beyond1920 (via GitHub)" <gi...@apache.org>.
beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1140231739


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression expr) {
   public interface Evaluator extends Serializable {
 
     /**
-     * Decides whether it's possible to match based on the column stats.
+     * Evaluates whether it's possible to match based on the column stats.
      *
      * @param columnStatsMap column statistics
-     * @return
+     * @return false if it's not possible to match, true otherwise.
      */
     boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+    /**
+     * Evaluates whether it matches based on the column values.
+     *
+     * @param columnValues column values
+     * @return true if it's matches, false otherwise.
+     */
+    boolean eval(Object[] columnValues);
   }

Review Comment:
   Open a new [PR](https://github.com/apache/hudi/pull/8218)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org