You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/25 23:35:14 UTC

[GitHub] [hudi] yihua opened a new pull request, #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

yihua opened a new pull request, #5427:
URL: https://github.com/apache/hudi/pull/5427

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109299948

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300",
       "triggerID" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71805c763f244e9e59832b9d67f48d74f1e9c64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c71805c763f244e9e59832b9d67f48d74f1e9c64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300) 
   * c71805c763f244e9e59832b9d67f48d74f1e9c64 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109985938

   Given we're punting on this fix for 0.11, i think we can avoid making these changes in the light of #5430 following up fairly soon
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109149306

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858131177


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -324,7 +326,14 @@ object HoodieSparkUtils extends SparkAdapterSupport {
       val name2Fields = tableAvroSchema.getFields.asScala.map(f => f.name() -> f).toMap
       // Here have to create a new Schema.Field object
       // to prevent throwing exceptions like "org.apache.avro.AvroRuntimeException: Field already used".
-      val requiredFields = requiredColumns.map(c => name2Fields(c))
+      val requiredFields = requiredColumns.filter(c => {

Review Comment:
   can't we do this while appending mandatory columns ? i.e compare w/ table schema and drop missing fields. so that we do this filtering only for the mandatory columns that we look to add and not touch query columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858305310


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -54,29 +58,75 @@ private InternalSchemaUtils() {
    */
   public static InternalSchema pruneInternalSchema(InternalSchema schema, List<String> names) {
     // do check
-    List<Integer> prunedIds = names.stream().map(name -> {
+    List<Integer> prunedIds = names.stream()
+        .filter(name -> {
+          int id = schema.findIdByName(name);
+          if (id < 0) {
+            LOG.warn(String.format("cannot prune col: %s does not exist in hudi table", name));

Review Comment:
   prior to this patch, we were throwing exception and now we are not? is this change intended? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1110029382

   Agree.  This is more like a bandaid fix for the 0.11.0 release.  Since we don't need this for 0.11.0, we should close this one in favor of #5430 which is a proper fix and improvement.  Closing this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858134674


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -324,7 +326,14 @@ object HoodieSparkUtils extends SparkAdapterSupport {
       val name2Fields = tableAvroSchema.getFields.asScala.map(f => f.name() -> f).toMap
       // Here have to create a new Schema.Field object
       // to prevent throwing exceptions like "org.apache.avro.AvroRuntimeException: Field already used".
-      val requiredFields = requiredColumns.map(c => name2Fields(c))
+      val requiredFields = requiredColumns.filter(c => {

Review Comment:
   alexey: I am not very sure on the amount of changes required for the proposal you have made. but lets try to make minimal changes to make progress w/o requiring more testing. anyways, we will revisit the preCombine field setting altogether for 0.12 and put some fixes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua closed pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
yihua closed pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field
URL: https://github.com/apache/hudi/pull/5427


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858121011


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -324,7 +326,14 @@ object HoodieSparkUtils extends SparkAdapterSupport {
       val name2Fields = tableAvroSchema.getFields.asScala.map(f => f.name() -> f).toMap
       // Here have to create a new Schema.Field object
       // to prevent throwing exceptions like "org.apache.avro.AvroRuntimeException: Field already used".
-      val requiredFields = requiredColumns.map(c => name2Fields(c))
+      val requiredFields = requiredColumns.filter(c => {

Review Comment:
   We should not relax this here actually, b/c `requiredColumns` will contain also query columns.
   
   Instead we should provide `HoodieMergeOnReadRDD` 2 parquet readers: 
   
   1. Primed for merging (ie for schema containing record-key, precombine-key)
   2. Primed for NO merging (ie whose schema could be essentially empty)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858306275


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -54,29 +58,75 @@ private InternalSchemaUtils() {
    */
   public static InternalSchema pruneInternalSchema(InternalSchema schema, List<String> names) {
     // do check
-    List<Integer> prunedIds = names.stream().map(name -> {
+    List<Integer> prunedIds = names.stream()
+        .filter(name -> {
+          int id = schema.findIdByName(name);
+          if (id < 0) {
+            LOG.warn(String.format("cannot prune col: %s does not exist in hudi table", name));
+            return false;
+          }
+          return true;
+        })
+        .map(schema::findIdByName).collect(Collectors.toList());
+    // find top parent field ID. eg: a.b.c, f.g.h, only collect id of a and f ignore all child field.
+    List<Integer> topParentFieldIds = new ArrayList<>();
+    names.stream().forEach(f -> {
+      int id = schema.findIdByName(f.split("\\.")[0]);
+      if (!topParentFieldIds.contains(id)) {
+        topParentFieldIds.add(id);
+      }
+    });
+    return pruneInternalSchemaByID(schema, prunedIds, topParentFieldIds);
+  }
+
+  /**
+   * Create project internalSchema, based on the project names which produced by query engine and Hudi fields.
+   * support nested project.
+   *
+   * @param schema      a internal schema.
+   * @param queryFields project names produced by query engine.
+   * @param hudiFields  project names required by Hudi merging.
+   * @return a project internalSchema.
+   */
+  public static InternalSchema pruneInternalSchema(InternalSchema schema, List<String> queryFields, List<String> hudiFields) {

Review Comment:
   with the addition of this new method, is method at L 59 called anywhere? I expect all callers to use this instead of that? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109371819

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300",
       "triggerID" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71805c763f244e9e59832b9d67f48d74f1e9c64",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8311",
       "triggerID" : "c71805c763f244e9e59832b9d67f48d74f1e9c64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c71805c763f244e9e59832b9d67f48d74f1e9c64 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8311) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858156982


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -324,7 +326,14 @@ object HoodieSparkUtils extends SparkAdapterSupport {
       val name2Fields = tableAvroSchema.getFields.asScala.map(f => f.name() -> f).toMap
       // Here have to create a new Schema.Field object
       // to prevent throwing exceptions like "org.apache.avro.AvroRuntimeException: Field already used".
-      val requiredFields = requiredColumns.map(c => name2Fields(c))
+      val requiredFields = requiredColumns.filter(c => {

Review Comment:
   I'm going to rethink the minimal changes to unblock 0.11 release.  The changes in the current shape introduce the problem with non-existent query columns as you mentioned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109301948

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300",
       "triggerID" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71805c763f244e9e59832b9d67f48d74f1e9c64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8311",
       "triggerID" : "c71805c763f244e9e59832b9d67f48d74f1e9c64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300) 
   * c71805c763f244e9e59832b9d67f48d74f1e9c64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8311) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858106552


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -54,13 +58,16 @@ private InternalSchemaUtils() {
    */
   public static InternalSchema pruneInternalSchema(InternalSchema schema, List<String> names) {
     // do check
-    List<Integer> prunedIds = names.stream().map(name -> {
-      int id = schema.findIdByName(name);
-      if (id == -1) {
-        throw new IllegalArgumentException(String.format("cannot prune col: %s which not exisit in hudi table", name));
-      }
-      return id;
-    }).collect(Collectors.toList());
+    List<Integer> prunedIds = names.stream()
+        .filter(name -> {
+          int id = schema.findIdByName(name);

Review Comment:
   can you help me understand something. I understand if non existant preCombine is part of the names, we ignore it. 
   But if someone does a query "select a,b,c from tbl", where b does not even exist in the table, we have to throw exception. Can you confirm that is not affected by this fix here. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109150765

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300",
       "triggerID" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5427:
URL: https://github.com/apache/hudi/pull/5427#issuecomment-1109213490

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300",
       "triggerID" : "fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe6cc9d4d51c6a8a6f2b8cbd969a06d835a4b8e0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8300) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858155285


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -54,13 +58,16 @@ private InternalSchemaUtils() {
    */
   public static InternalSchema pruneInternalSchema(InternalSchema schema, List<String> names) {
     // do check
-    List<Integer> prunedIds = names.stream().map(name -> {
-      int id = schema.findIdByName(name);
-      if (id == -1) {
-        throw new IllegalArgumentException(String.format("cannot prune col: %s which not exisit in hudi table", name));
-      }
-      return id;
-    }).collect(Collectors.toList());
+    List<Integer> prunedIds = names.stream()
+        .filter(name -> {
+          int id = schema.findIdByName(name);

Review Comment:
   Actually, the filtering should not happen after a second thought.  I'm going to rethink how to make the fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org