You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/05 03:15:21 UTC

[GitHub] [hudi] alexeykudinkin opened a new pull request, #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

alexeykudinkin opened a new pull request, #5224:
URL: https://github.com/apache/hudi/pull/5224

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Fix handling of the `isNotNull` predicate in Data Skipping
   
   ## Brief change log
   
    - Fixing handling of the `isNotNull` predicate in Data Skipping
    - Tidying up
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1089630462

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f054124c0553a818f87534c007a842e1f530e726 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845) 
   * 6cabaa77d4184f9c337049d6b3a2fea5c4266626 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1088254248

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * acc1a2f3e3b849d62630729554ca4814337a7789 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821) 
   * ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1088227465

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * acc1a2f3e3b849d62630729554ca4814337a7789 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #5224:
URL: https://github.com/apache/hudi/pull/5224#discussion_r843039638


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala:
##########
@@ -211,10 +211,10 @@ object DataSkippingUtils extends Logging {
           .map(colName => GreaterThan(genColNumNullsExpr(colName), Literal(0)))
 
       // Filter "colA is not null"
-      // Translates to "colA_nullCount = 0" for index lookup
+      // Translates to "colA_nullCount < colA_valueCount" for index lookup
       case IsNotNull(attribute: AttributeReference) =>
         getTargetIndexedColumnName(attribute, indexSchema)
-          .map(colName => EqualTo(genColNumNullsExpr(colName), Literal(0)))
+          .map(colName => LessThan(genColNumNullsExpr(colName), genColValueCountExpr))

Review Comment:
   got it. makes sense then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1088226264

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * acc1a2f3e3b849d62630729554ca4814337a7789 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #5224:
URL: https://github.com/apache/hudi/pull/5224


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5224:
URL: https://github.com/apache/hudi/pull/5224#discussion_r843030033


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala:
##########
@@ -211,10 +211,10 @@ object DataSkippingUtils extends Logging {
           .map(colName => GreaterThan(genColNumNullsExpr(colName), Literal(0)))
 
       // Filter "colA is not null"
-      // Translates to "colA_nullCount = 0" for index lookup
+      // Translates to "colA_nullCount < colA_valueCount" for index lookup
       case IsNotNull(attribute: AttributeReference) =>
         getTargetIndexedColumnName(attribute, indexSchema)
-          .map(colName => EqualTo(genColNumNullsExpr(colName), Literal(0)))
+          .map(colName => LessThan(genColNumNullsExpr(colName), genColValueCountExpr))

Review Comment:
   Ha! Great question. 
   
   The reason is logical fallacy: "colA is not null" != "there is no null in colA", instead it's "colA contains non-null" (you can check out some other expressions, there are many expressions in this list that carry such properties):
   
    - "is null" means that the column has to contain null values (ie `nullCount` > 0)
    - "is not null" means that the column has to contain non-null values (ie `nullCount` < `valueCount`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1089711898

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6cabaa77d4184f9c337049d6b3a2fea5c4266626 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1089373494

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825) 
   * f054124c0553a818f87534c007a842e1f530e726 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1089369743

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825) 
   * f054124c0553a818f87534c007a842e1f530e726 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1090458591

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6cabaa77d4184f9c337049d6b3a2fea5c4266626 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851) 
   * 4075e193a3de8129f9fb1efa6e5a674566e50c01 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1088251550

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * acc1a2f3e3b849d62630729554ca4814337a7789 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821) 
   * ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1088299195

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1090465028

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d866e608a87c606ccb494d5f064c891044ded75a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7871",
       "triggerID" : "d866e608a87c606ccb494d5f064c891044ded75a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6cabaa77d4184f9c337049d6b3a2fea5c4266626 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851) 
   * 4075e193a3de8129f9fb1efa6e5a674566e50c01 UNKNOWN
   * d866e608a87c606ccb494d5f064c891044ded75a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1090536924

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d866e608a87c606ccb494d5f064c891044ded75a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7871",
       "triggerID" : "d866e608a87c606ccb494d5f064c891044ded75a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4075e193a3de8129f9fb1efa6e5a674566e50c01 UNKNOWN
   * d866e608a87c606ccb494d5f064c891044ded75a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1089504573

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f054124c0553a818f87534c007a842e1f530e726 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #5224:
URL: https://github.com/apache/hudi/pull/5224#discussion_r842353165


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##########
@@ -224,7 +228,8 @@ object ColumnStatsIndexSupport {
   private val COLUMN_STATS_INDEX_FILE_COLUMN_NAME = "fileName"
   private val COLUMN_STATS_INDEX_MIN_VALUE_STAT_NAME = "minValue"
   private val COLUMN_STATS_INDEX_MAX_VALUE_STAT_NAME = "maxValue"
-  private val COLUMN_STATS_INDEX_NUM_NULLS_STAT_NAME = "num_nulls"
+  private val COLUMN_STATS_INDEX_NULL_COUNT_STAT_NAME = "nullCount"

Review Comment:
   Can we not reuse HoodieMetadataPayload.* constants here as well?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala:
##########
@@ -211,10 +211,10 @@ object DataSkippingUtils extends Logging {
           .map(colName => GreaterThan(genColNumNullsExpr(colName), Literal(0)))
 
       // Filter "colA is not null"
-      // Translates to "colA_nullCount = 0" for index lookup
+      // Translates to "colA_nullCount < colA_valueCount" for index lookup
       case IsNotNull(attribute: AttributeReference) =>
         getTargetIndexedColumnName(attribute, indexSchema)
-          .map(colName => EqualTo(genColNumNullsExpr(colName), Literal(0)))
+          .map(colName => LessThan(genColNumNullsExpr(colName), genColValueCountExpr))

Review Comment:
   Filter `colA is not null` is the complement to `colA is null` then why the two have different translation (one has to depend on the valueCount while the other depends on Literal(0))? 
   
   I mean if `colA is null` is translated to `GreaterThan(genColNumNullsExpr(colName), Literal(0))`, then shouldn't `colA is not null` be translated to `LessThanOrEqual(genColNumNullsExpr(colName), Literal(0))`?
   
   Or if you say that `colA is not null` should be translated to `LessThan(genColNumNullsExpr(colName), genColValueCountExpr)`, then shouldn't `colA is null` be translated to `GreaterThanOrEqual(genColNumNullsExpr(colName), genColValueCountExpr)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1088230011

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * acc1a2f3e3b849d62630729554ca4814337a7789 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739][Stacked on 5208] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1089633337

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f054124c0553a818f87534c007a842e1f530e726 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845) 
   * 6cabaa77d4184f9c337049d6b3a2fea5c4266626 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5224:
URL: https://github.com/apache/hudi/pull/5224#issuecomment-1090461823

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7821",
       "triggerID" : "acc1a2f3e3b849d62630729554ca4814337a7789",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7825",
       "triggerID" : "ee0f8a79e72fe1c6464d95042aec4c8f59d6b9ea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f054124c0553a818f87534c007a842e1f530e726",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7845",
       "triggerID" : "f054124c0553a818f87534c007a842e1f530e726",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851",
       "triggerID" : "6cabaa77d4184f9c337049d6b3a2fea5c4266626",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4075e193a3de8129f9fb1efa6e5a674566e50c01",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d866e608a87c606ccb494d5f064c891044ded75a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d866e608a87c606ccb494d5f064c891044ded75a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6cabaa77d4184f9c337049d6b3a2fea5c4266626 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7851) 
   * 4075e193a3de8129f9fb1efa6e5a674566e50c01 UNKNOWN
   * d866e608a87c606ccb494d5f064c891044ded75a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5224: [HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skipping

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5224:
URL: https://github.com/apache/hudi/pull/5224#discussion_r844137751


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##########
@@ -224,7 +228,8 @@ object ColumnStatsIndexSupport {
   private val COLUMN_STATS_INDEX_FILE_COLUMN_NAME = "fileName"
   private val COLUMN_STATS_INDEX_MIN_VALUE_STAT_NAME = "minValue"
   private val COLUMN_STATS_INDEX_MAX_VALUE_STAT_NAME = "maxValue"
-  private val COLUMN_STATS_INDEX_NUM_NULLS_STAT_NAME = "num_nulls"
+  private val COLUMN_STATS_INDEX_NULL_COUNT_STAT_NAME = "nullCount"

Review Comment:
   Good call. This was copied over from a legacy component w/o much afterthought. Addressed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org