You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/10 04:57:36 UTC

[GitHub] [hudi] trushev opened a new pull request, #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(R…

trushev opened a new pull request, #5830:
URL: https://github.com/apache/hudi/pull/5830

   …FC-33)
   
   ## What is the purpose of the pull request
   This PR adds support of reading by flink when comprehensive schema evolution(RFC-33) enabled and there were some operations *add column*, *rename column*, *change type of column*, *drop column*.
   
   ## Brief change log
   
     - Added new option to enable comprehensive schema evolution in flink
     - Key changes are made inside `CopyOnWriteInputFormat` and `MergeOnReadInputFormat`. Now, during the opening, it calculates schema of file, if this schema differs from queried schema, it creates cast map. After reading file, type conversion is performed according to constructed map.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
     - Added unit test `TestCastMap` to verify that type conversion is correct
     - Added integration test `ITTestSchemaEvolution` to verify that table with added, renamed, casted, dropped columns is read as expected.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022524997


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/InternalSchema.java:
##########
@@ -66,6 +77,11 @@ public InternalSchema(Field... columns) {
     this(DEFAULT_VERSION_ID, Arrays.asList(columns));
   }
 
+  public InternalSchema(long versionId, Schema avroSchema) {
+    this(versionId, ((Types.RecordType) AvroInternalSchemaConverter.convertToField(avroSchema)).fields());
+    this.avroSchema = avroSchema;
+  }

Review Comment:
   I was an idea how to solve this one https://github.com/apache/hudi/pull/5830#discussion_r925340228
   But I'm going to revert in now due to https://github.com/apache/hudi/pull/5830#issuecomment-1314709284



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1315020080

   
   
   
   > > Either we have some tool for fetching the right avro schema in evolution use cases
   > 
   > `avroSchema` does not support custom ids. We use them to build mergedInternalSchema to map old type(name) to new type (name)
   > 
   > > or we keep only the internal schema that is compatible for evolution.
   > 
   > I tried but it didn't work. There were differences between the origin `avroSchema` and converted `internalSchema` to avro schema. Mb it was a bug with converter. So I just kept the origin `avroSchema` inside `internalSchema`. I don't like this approach and want to revert it. "Leave here the changes only concerning flink"
   
   I disagree with putting avro Schema into internal Schema.
   internalSchema is an independent schema abstraction, should not be bound to avroschema。 
   
   **There were differences between the origin `avroSchema` and converted `internalSchema` to avro schema. Mb it was a bug with converter**
   this is not a bug,  The implementation of avroSchemaConvert is different between spark and flink,
   
   
   
   
   > > Either we have some tool for fetching the right avro schema in evolution use cases
   > 
   > `avroSchema` does not support custom ids. We use them to build mergedInternalSchema to map old type(name) to new type (name)
   > 
   > > or we keep only the internal schema that is compatible for evolution.
   > 
   > I tried but it didn't work. There were differences between the origin `avroSchema` and converted `internalSchema` to avro schema. Mb it was a bug with converter. So I just kept the origin `avroSchema` inside `internalSchema`. I don't like this approach and want to revert it. "Leave here the changes only concerning flink"
   
   not a bug, Both flink and spark have their own schema converters; And the avro schema converted by these two converters are different.  This is a gap, we need unity in the future,  https://github.com/apache/hudi/pull/6358 do the unify of spark but flink is not included
   for flink, maybe we can create a new convert function   AvroInternalSchemaConverter.buildAvroSchemaFromInternalSchema to convert internalSchema to avro just like flink AvroSchemaConverter.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022524997


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/InternalSchema.java:
##########
@@ -66,6 +77,11 @@ public InternalSchema(Field... columns) {
     this(DEFAULT_VERSION_ID, Arrays.asList(columns));
   }
 
+  public InternalSchema(long versionId, Schema avroSchema) {
+    this(versionId, ((Types.RecordType) AvroInternalSchemaConverter.convertToField(avroSchema)).fields());
+    this.avroSchema = avroSchema;
+  }

Review Comment:
   It was an idea how to solve this one https://github.com/apache/hudi/pull/5830#discussion_r925340228
   But I'm going to revert it now due to https://github.com/apache/hudi/pull/5830#issuecomment-1314709284



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022526462


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -491,7 +492,7 @@ public Pair<HoodieMetadataMergedLogRecordReader, Long> getLogRecordScanner(List<
         .withFileSystem(metadataMetaClient.getFs())
         .withBasePath(metadataBasePath)
         .withLogFilePaths(sortedLogFilePaths)
-        .withReaderSchema(schema)
+        .withReaderSchema(AvroInternalSchemaConverter.convertToEmpty(schema))

Review Comment:
   I was asked to remove the second method https://github.com/apache/hudi/pull/5830#discussion_r1018637325



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314911375

   > Either we have some tool for fetching the right avro schema in evolution use cases
   
   `avroSchema` does not support custom ids. We use them to build mergedInternalSchema to map old type(name) to new type (name)
   
    > or we keep only the internal schema that is compatible for evolution.
   
   I tried but it didn't work. There were differences between origin `avroSchema` and  converted `internalSchema` to avro schema. Mb it was a bug. So I just kept the origin `avroSchema` inside `internalSchema`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1330033888

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1329995935",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1336313682

   @trushev @danny0405 Hello, can this pr be merged into 0.12.1 to support flink schema evolution? Do I need to merge other PRs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181325353

   I think it is ready to merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927312625


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##########
@@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) {
       return this;
     }
 
+    public Builder withInternalSchema(InternalSchema internalSchema) {
+      this.internalSchema = internalSchema;
+      return this;

Review Comment:
   I used the second schema here to be consistent with `HoodieMergedLogRecordScanner` which already uses this approach to scan logs in `HoodieMergeOnReadRDD#scanLog`. Do you think it is a bad practice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927355625


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   > we need to read baseFile1 with schema1 
   
   Shouldn't the schema metadata tells us the latest schema then ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179860060

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2dc34ce7581e1f0c631901ed9060837343220f2f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818) 
   * cb28b4f297e0e0b41ce9ea46b8be5002190e9f94 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1180806609

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 239534226d5cf6cbef8ef1e8dc454daf3dacf20b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r925301529


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1667,16 +1667,14 @@ public void reOrderColPosition(String colName, String referColName, TableChange.
   private Pair<InternalSchema, HoodieTableMetaClient> getInternalSchemaAndMetaClient() {
     HoodieTableMetaClient metaClient = createMetaClient(true);
     TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient);
-    Option<InternalSchema> internalSchemaOption = schemaUtil.getTableInternalSchemaFromCommitMetadata();
-    if (!internalSchemaOption.isPresent()) {
-      throw new HoodieException(String.format("cannot find schema for current table: %s", config.getBasePath()));
-    }
-    return Pair.of(internalSchemaOption.get(), metaClient);

Review Comment:
   I take a quick look at the PR and feels that the schema about codes is too invasive to be everywhere, which is hard to maintain and prone to be buggy, we need a more neat way for the code engineering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927306341


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -279,7 +279,7 @@ private void saveInternalSchema(HoodieTable table, String instantTime, HoodieCom
     FileBasedInternalSchemaStorageManager schemasManager = new FileBasedInternalSchemaStorageManager(table.getMetaClient());
     if (!historySchemaStr.isEmpty() || Boolean.parseBoolean(config.getString(HoodieCommonConfig.RECONCILE_SCHEMA.key()))) {
       InternalSchema internalSchema;
-      Schema avroSchema = HoodieAvroUtils.createHoodieWriteSchema(new Schema.Parser().parse(config.getSchema()));
+      Schema avroSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()), config.allowOperationMetadataField());
       if (historySchemaStr.isEmpty()) {

Review Comment:
   1) Because `HoodieAvroUtils` from `hudi-common` while `HoodieWriteConfig` from `hudi-client-common` which depends on the former
   2) And workaround (1) using `HoodieConfig` instead of `HoodieWriteConfig` does not make sense because there is no `getSchema()` in `HoodieConfig`. There is no even `AVRO_SCHEMA_STRING` to `HoodieConfig.getString(AVRO_SCHEMA_STRING)`. I think raw string `hoodie.avro.schema` is an inappropriate approach



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1001524335


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -453,6 +453,7 @@ private MergeOnReadInputFormat mergeOnReadInputFormat(
         this.requiredPos,
         this.conf.getString(FlinkOptions.PARTITION_DEFAULT_NAME),
         this.limit == NO_LIMIT_CONSTANT ? Long.MAX_VALUE : this.limit, // ParquetInputFormat always uses the limit value
+        this.conf,
         getParquetConf(this.conf, this.hadoopConf),

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314919173

   > Reader schema: the rest cases
   
   What do you mean for the rest cases ? Isn't either schema evolution enabled or disabled ?
   
   > I don't like this approach and want to revert it
   
   I agree, and let's make the refactoring in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314709284

   > > withInternalSchema(
   > 
   > Agree
   
   So as I understand, you don't mind if I revert all changes not related to flink, which means adding `withInternalSchema` to `HoodieUnMergedLogRecordScanner`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022515240


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
       return null;
     }
   }
+
+  protected Iterator<GenericRecord> getRecordIterator(
+      HoodieTable<T, ?, ?, ?> table,
+      HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+      HoodieBaseFile baseFile,
+      HoodieFileReader<GenericRecord> reader,
+      Schema readSchema) throws IOException {
+    Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+    if (!querySchemaOpt.isPresent()) {
+      querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+    }
+    boolean needToReWriteRecord = false;
+    Map<String, String> renameCols = new HashMap<>();
+    // TODO support bootstrap
+    if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) {

Review Comment:
   > @trushev can we avoid moved this code snippet, i donnot think flink evolution need to modify those codes. #6358 and #7183 will optimize this code
   
   @xiarixiaoyao This code should be moved from `HoodieMergeHelper` to `BaseMergeHelper` due to current class hierarchy:
   <img width="439" src="https://user-images.githubusercontent.com/42293632/201876103-6e59834e-ad85-4b22-9de4-257e26cdfd88.png">
   
   I don't want to modify that code I just want to reuse it in flink



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022523171


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataMergedLogRecordReader.java:
##########
@@ -54,15 +54,15 @@ public class HoodieMetadataMergedLogRecordReader extends HoodieMergedLogRecordSc
 
   private HoodieMetadataMergedLogRecordReader(FileSystem fs, String basePath, String partitionName,
                                               List<String> logFilePaths,
-                                              Schema readerSchema, String latestInstantTime,
+                                              InternalSchema readerSchema, String latestInstantTime,
                                               Long maxMemorySizeInBytes, int bufferSize,
                                               String spillableMapBasePath,
                                               ExternalSpillableMap.DiskMapType diskMapType,
                                               boolean isBitCaskDiskMapCompressionEnabled,
                                               Option<InstantRange> instantRange, boolean allowFullScan, boolean useScanV2) {
     super(fs, basePath, logFilePaths, readerSchema, latestInstantTime, maxMemorySizeInBytes, true, false, bufferSize,
         spillableMapBasePath, instantRange, diskMapType, isBitCaskDiskMapCompressionEnabled, false, allowFullScan,
-            Option.of(partitionName), InternalSchema.getEmptyInternalSchema(), useScanV2);
+            Option.of(partitionName), useScanV2);

Review Comment:
   No need to modify this class



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -491,7 +492,7 @@ public Pair<HoodieMetadataMergedLogRecordReader, Long> getLogRecordScanner(List<
         .withFileSystem(metadataMetaClient.getFs())
         .withBasePath(metadataBasePath)
         .withLogFilePaths(sortedLogFilePaths)
-        .withReaderSchema(schema)
+        .withReaderSchema(AvroInternalSchemaConverter.convertToEmpty(schema))

Review Comment:
   HoodieBackedTableMetadata no support scheam evolution.
   No need to modify this class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1316199532

   > @trushev @danny0405 @xiarixiaoyao Thank you folks for pushing the schema evolution support in Flink? Do you guys think we can merge this before 0.13.0 code freeze (Dec 12)?
   
   I'm going to rework the PR according comments above this week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1324519494

   > @danny0405 It is hard to maintain this PR. Despite the fact that this feature is related only to flink, changes are needed in common part. For example, [#5830 (comment)](https://github.com/apache/hudi/pull/5830#discussion_r1023681719) and [#5830 (comment)](https://github.com/apache/hudi/pull/5830#discussion_r1022504653) Because of this, merge conflicts often appear. Currently, `HoodieMergeHelper.java` both modified again. I've decided that **I will make common part changes in separate PRs**. I hope that such changes will be quickly approved and merged into master branch. That will reduce the number of conflicts and make it easier to maintain this PR as well as reviewing the code.
   
   Sure, let's resolve the upstream issues first. You can ping me if review is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033212254


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   So, cast map returns null only if origin val is null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1329995935

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1336711399

   > @trushev @danny0405 Hello, can this pr be merged into 0.12.1 to support flink schema evolution? Do I need to merge other PRs?
   
   Yes there are several commits that this PR depends on. I think it is not a big deal to backport the feature. I'm just not sure about release policy. Is such change suitable minor update 1.12.1 -> 1.12.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1031134194


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FlinkInternalSchemaManager.java:
##########
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import org.apache.hadoop.fs.Path;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class FlinkInternalSchemaManager implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  /**
+   * Creates the manager if schema evolution enabled.
+   */
+  public static Option<FlinkInternalSchemaManager> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new FlinkInternalSchemaManager(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  FlinkInternalSchemaManager(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  /**
+   * Returns query schema as InternalSchema.
+   */
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  /**
+   * Returns schema of fileSplit.
+   */
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  /**
+   * Returns schema of mor fileSplit.
+   */
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    Option<String> basePath = split.getBasePath();
+    String commitTime;
+    if (basePath.isPresent()) {
+      String name = new Path(basePath.get()).getName();
+      commitTime = FSUtils.getCommitTime(name);
+    } else {
+      commitTime = split.getLatestCommit();
+    }
+    return getActualSchema(commitTime);
+  }
+
+  /**
+   * Returns list of field names in internalSchema.
+   */
+  public List<String> getFieldNames(InternalSchema internalSchema) {
+    return internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList());
+  }
+
+  /**
+   * Returns list of field types in internalSchema.
+   */
+  public List<DataType> getFieldTypes(InternalSchema internalSchema) {
+    return AvroSchemaConverter.convertToDataType(
+        AvroInternalSchemaConverter.convert(internalSchema, getTableName())).getChildren();
+  }
+
+  /**
+   * Returns castMap based on conversions from actualSchema to querySchema.
+   */
+  public CastMap getCastMap(InternalSchema querySchema, InternalSchema actualSchema) {
+    return CastMap.of(getTableName(), querySchema, actualSchema);
+  }
+
+  /**
+   * Returns array of types positioned in fieldTypes according to selectedFields.
+   */
+  public LogicalType[] project(List<DataType> fieldTypes, int[] selectedFields) {
+    return Arrays.stream(selectedFields)

Review Comment:
   Method removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1331624352

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1329995935",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5b7eed269294cb9be8f0875517b062e45e7ddb84",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13340",
       "triggerID" : "5b7eed269294cb9be8f0875517b062e45e7ddb84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   * 5b7eed269294cb9be8f0875517b062e45e7ddb84 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13340) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152598283

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06a66b2cc3450cd29b13b755976480317e134b4c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218) 
   * 7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1153623403

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 80d33308c8a4e290c0c7b66fff4023e8825f8163 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r911569288


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java:
##########
@@ -227,7 +228,10 @@ protected void loadRecords(String partitionPath) throws Exception {
             .filter(logFile -> isValidFile(logFile.getFileStatus()))
             .map(logFile -> logFile.getPath().toString())
             .collect(toList());
-        HoodieMergedLogRecordScanner scanner = FormatUtils.logScanner(logPaths, schema, latestCommitTime.get().getTimestamp(),
+        InternalSchema internalSchema = new TableSchemaResolver(this.hoodieTable.getMetaClient())

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1171860774

   Sorry for force push, rebased on the latest master to get fix [HUDI-4258]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927390976


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   That fact makes me to change `DataType[] fullFieldTypes` once inputSplit is passed to `MergeOnReadInputFormat`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1023780152


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FlinkInternalSchemaManager.java:
##########
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import org.apache.hadoop.fs.Path;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class FlinkInternalSchemaManager implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  /**
+   * Creates the manager if schema evolution enabled.
+   */
+  public static Option<FlinkInternalSchemaManager> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new FlinkInternalSchemaManager(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  FlinkInternalSchemaManager(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  /**
+   * Returns query schema as InternalSchema.
+   */
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  /**
+   * Returns schema of fileSplit.
+   */
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  /**
+   * Returns schema of mor fileSplit.
+   */
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    Option<String> basePath = split.getBasePath();
+    String commitTime;
+    if (basePath.isPresent()) {
+      String name = new Path(basePath.get()).getName();
+      commitTime = FSUtils.getCommitTime(name);
+    } else {
+      commitTime = split.getLatestCommit();
+    }
+    return getActualSchema(commitTime);
+  }
+
+  /**
+   * Returns list of field names in internalSchema.
+   */
+  public List<String> getFieldNames(InternalSchema internalSchema) {
+    return internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList());
+  }
+
+  /**
+   * Returns list of field types in internalSchema.
+   */
+  public List<DataType> getFieldTypes(InternalSchema internalSchema) {
+    return AvroSchemaConverter.convertToDataType(
+        AvroInternalSchemaConverter.convert(internalSchema, getTableName())).getChildren();
+  }
+
+  /**
+   * Returns castMap based on conversions from actualSchema to querySchema.
+   */
+  public CastMap getCastMap(InternalSchema querySchema, InternalSchema actualSchema) {
+    return CastMap.of(getTableName(), querySchema, actualSchema);
+  }
+
+  /**
+   * Returns array of types positioned in fieldTypes according to selectedFields.
+   */
+  public LogicalType[] project(List<DataType> fieldTypes, int[] selectedFields) {
+    return Arrays.stream(selectedFields)

Review Comment:
   The method does not belong here, may put in `DataTypeUtils`.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java:
##########
@@ -394,4 +416,22 @@ private InflaterInputStreamFactory<?> getInflaterInputStreamFactory(org.apache.h
     }
   }
 
+  private void setActualFields(FileInputSplit fileSplit) {
+    FlinkInternalSchemaManager sm = schemaManager.get();
+    InternalSchema actualSchema = sm.getActualSchema(fileSplit);
+    List<DataType> fieldTypes = sm.getFieldTypes(actualSchema);
+    CastMap castMap = sm.getCastMap(sm.getQuerySchema(), actualSchema);
+    int[] shiftedSelectedFields = Arrays.stream(selectedFields).map(pos -> pos + HOODIE_META_COLUMNS.size()).toArray();
+    if (castMap.containsAnyPos(shiftedSelectedFields)) {

Review Comment:
   Caution that the metadata field may also be queried.



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -286,4 +286,18 @@ public static Map<String, String> collectRenameCols(InternalSchema oldSchema, In
       return e.substring(lastDotIndex == -1 ? 0 : lastDotIndex + 1);
     }));
   }
+
+  /**
+   * Returns whether passed types are the same.
+   *
+   * @param t1 first type
+   * @param t2 second type
+   * @return true if types are the same
+   */
+  public static boolean isSameType(Type t1, Type t2) {
+    if (t1 instanceof Types.DecimalType && t2 instanceof Types.DecimalType) {
+      return t1.equals(t2);

Review Comment:
   Can we implement the `Type#equals` correctly instead of add this tool ?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FlinkInternalSchemaManager.java:
##########
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import org.apache.hadoop.fs.Path;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class FlinkInternalSchemaManager implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  /**
+   * Creates the manager if schema evolution enabled.
+   */
+  public static Option<FlinkInternalSchemaManager> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new FlinkInternalSchemaManager(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  FlinkInternalSchemaManager(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  /**
+   * Returns query schema as InternalSchema.
+   */
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  /**
+   * Returns schema of fileSplit.
+   */
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  /**
+   * Returns schema of mor fileSplit.
+   */
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    Option<String> basePath = split.getBasePath();
+    String commitTime;
+    if (basePath.isPresent()) {
+      String name = new Path(basePath.get()).getName();
+      commitTime = FSUtils.getCommitTime(name);
+    } else {
+      commitTime = split.getLatestCommit();

Review Comment:
   How about the log files then ? Is the log file commit time can be considered here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314705169

   @danny0405 In fact, I'd prefer not to replace two schema approach with  one scheme approach, as it does not look like part of flink schema evolution feature
   Moreover, two schema approach still keeps coming in PRs
   https://github.com/apache/hudi/pull/7187/files
   I think it's reasonable to go back to my original idea of adding `withInternalSchema(InternalSchemainternalSchema)` to `HoodieUnMergedLogRecordScanner` https://github.com/apache/hudi/pull/5830#discussion_r925340228
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022517049


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/InternalSchema.java:
##########
@@ -66,6 +77,11 @@ public InternalSchema(Field... columns) {
     this(DEFAULT_VERSION_ID, Arrays.asList(columns));
   }
 
+  public InternalSchema(long versionId, Schema avroSchema) {
+    this(versionId, ((Types.RecordType) AvroInternalSchemaConverter.convertToField(avroSchema)).fields());
+    this.avroSchema = avroSchema;
+  }

Review Comment:
   why wrap avroSchema into InternalSchema.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033063183


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -120,6 +121,12 @@ private FlinkOptions() {
       .withDescription("The default partition name in case the dynamic partition"
           + " column value is null/empty string");
 
+  public static final ConfigOption<Boolean> SCHEMA_EVOLUTION_ENABLED = ConfigOptions
+      .key(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key())
+      .booleanType()

Review Comment:
   There is no need to add the option if the key is the same with Hoodie core's.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/RowDataProjection.java:
##########
@@ -87,4 +88,8 @@ public Object[] projectAsValues(RowData rowData) {
     }
     return values;
   }
+
+  protected @Nullable Object rewriteVal(int pos, @Nullable Object val) {
+    return val;

Review Comment:
   `rewriteVal` => `getVal `, usually we do not overwrite impl methods, but only abstract methods, the override is not very friendly for base class performance.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetReader.java:
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.table.format.cow.ParquetSplitReaderUtil;
+import org.apache.hudi.util.RowDataProjection;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.types.DataType;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * Base interface for hoodie parquet readers.
+ */
+public interface HoodieParquetReader extends Closeable {
+
+  boolean reachedEnd() throws IOException;
+
+  RowData nextRecord();
+
+  static HoodieParquetReader getReader(
+      InternalSchemaManager internalSchemaManager,
+      boolean utcTimestamp,
+      boolean caseSensitive,
+      Configuration conf,
+      String[] fieldNames,
+      DataType[] fieldTypes,
+      Map<String, Object> partitionSpec,
+      int[] selectedFields,
+      int batchSize,
+      Path path,
+      long splitStart,
+      long splitLength) throws IOException {
+    Option<RowDataProjection> castProjection;
+    InternalSchema fileSchema = internalSchemaManager.getFileSchema(path.getName());
+    if (fileSchema.isEmptySchema()) {
+      castProjection = Option.empty();

Review Comment:
   Can return `HoodieParquetReader` directly here when we know `castProjection` is empty.



##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   For `float`, `double` and `decimal` data types, what is case when the target data type has precision loss, do we throw exception here ? Exactly what is the data type precedence(what kind of data type is castable here) for each of the type ?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/InternalSchemaManager.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.configuration.HadoopConfigurations;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Type;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.internal.schema.utils.InternalSchemaUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public class InternalSchemaManager implements Serializable {
+
+  private static final long serialVersionUID = 1L;
+
+  public static final InternalSchemaManager DISABLED = new InternalSchemaManager(null, null, null, null);
+
+  private final Configuration conf;
+  private final InternalSchema querySchema;
+  private final String validCommits;
+  private final String tablePath;
+  private transient org.apache.hadoop.conf.Configuration hadoopConf;
+
+  public static InternalSchemaManager get(Configuration conf, HoodieTableMetaClient metaClient) {
+    if (!conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      return DISABLED;
+    }
+    Option<InternalSchema> internalSchema = new TableSchemaResolver(metaClient).getTableInternalSchemaFromCommitMetadata();
+    if (!internalSchema.isPresent() || internalSchema.get().isEmptySchema()) {
+      return DISABLED;
+    }
+    String validCommits = metaClient
+        .getCommitsAndCompactionTimeline()
+        .filterCompletedInstants()
+        .getInstants()
+        .map(HoodieInstant::getFileName)
+        .collect(Collectors.joining(","));
+    return new InternalSchemaManager(conf, internalSchema.get(), validCommits, metaClient.getBasePathV2().toString());
+  }
+
+  public InternalSchemaManager(Configuration conf, InternalSchema querySchema, String validCommits, String tablePath) {
+    this.conf = conf;
+    this.querySchema = querySchema;
+    this.validCommits = validCommits;
+    this.tablePath = tablePath;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema != null ? querySchema : InternalSchema.getEmptyInternalSchema();
+  }
+
+  InternalSchema getFileSchema(String fileName) {
+    InternalSchema querySchema = getQuerySchema();
+    if (querySchema.isEmptySchema()) {
+      return InternalSchema.getEmptyInternalSchema();
+    }
+    long commitInstantTime = Long.parseLong(FSUtils.getCommitTime(fileName));
+    InternalSchema fileSchemaUnmerged = InternalSchemaCache.getInternalSchemaByVersionId(
+        commitInstantTime, tablePath, getHadoopConf(), validCommits);
+    if (querySchema.equals(fileSchemaUnmerged)) {
+      return InternalSchema.getEmptyInternalSchema();
+    }
+    return new InternalSchemaMerger(fileSchemaUnmerged, querySchema, true, true).mergeSchema();
+  }
+
+  CastMap getCastMap(InternalSchema fileSchema, String[] queryFieldNames, DataType[] queryFieldTypes, int[] selectedFields) {
+    assertSchemasAreNotEmpty(getQuerySchema(), fileSchema);
+
+    CastMap castMap = new CastMap();
+    Map<Integer, Integer> posProxy = getPosProxy(fileSchema, queryFieldNames);
+    if (posProxy.isEmpty()) {
+      castMap.setFileFieldTypes(queryFieldTypes);
+      return castMap;
+    }
+    List<Integer> selectedFieldList = IntStream.of(selectedFields).boxed().collect(Collectors.toList());
+    List<DataType> fileSchemaAsDataTypes = AvroSchemaConverter.convertToDataType(
+        AvroInternalSchemaConverter.convert(fileSchema, "tableName")).getChildren();
+    DataType[] fileFieldTypes = new DataType[queryFieldTypes.length];
+    for (int i = 0; i < queryFieldTypes.length; i++) {
+      Integer posOfChangedType = posProxy.get(i);
+      if (posOfChangedType == null) {
+        fileFieldTypes[i] = queryFieldTypes[i];
+      } else {
+        DataType fileType = fileSchemaAsDataTypes.get(posOfChangedType);
+        fileFieldTypes[i] = fileType;
+        int selectedPos = selectedFieldList.indexOf(i);
+        if (selectedPos != -1) {
+          castMap.add(selectedPos, fileType.getLogicalType(), queryFieldTypes[i].getLogicalType());
+        }
+      }
+    }
+    castMap.setFileFieldTypes(fileFieldTypes);
+    return castMap;
+  }
+
+  String[] getFileFieldNames(InternalSchema fileSchema, String[] queryFieldNames) {
+    assertSchemasAreNotEmpty(getQuerySchema(), fileSchema);
+
+    Map<String, String> renamedCols = InternalSchemaUtils.collectRenameCols(fileSchema, getQuerySchema());
+    if (renamedCols.isEmpty()) {
+      return queryFieldNames;
+    }
+    return Arrays.stream(queryFieldNames).map(name -> renamedCols.getOrDefault(name, name)).toArray(String[]::new);
+  }
+
+  private Map<Integer, Integer> getPosProxy(InternalSchema fileSchema, String[] queryFieldNames) {
+    Map<Integer, Pair<Type, Type>> changedCols = InternalSchemaUtils.collectTypeChangedCols(getQuerySchema(), fileSchema);
+    HashMap<Integer, Integer> posProxy = new HashMap<>(changedCols.size());
+    List<String> fieldNameList = Arrays.asList(queryFieldNames);
+    List<Types.Field> columns = getQuerySchema().columns();
+    changedCols.forEach((posInSchema, typePair) -> {
+      String name = columns.get(posInSchema).name();
+      int posInType = fieldNameList.indexOf(name);
+      posProxy.put(posInType, posInSchema);
+    });
+    return Collections.unmodifiableMap(posProxy);
+  }
+
+  private org.apache.hadoop.conf.Configuration getHadoopConf() {
+    if (hadoopConf == null) {
+      hadoopConf = HadoopConfigurations.getHadoopConf(conf);
+    }
+    return hadoopConf;
+  }
+
+  private static void assertSchemasAreNotEmpty(InternalSchema schema1, InternalSchema schema2) {
+    Preconditions.checkArgument(!schema1.isEmptySchema(), "InternalSchema cannot be empty here");

Review Comment:
   There is no need to bind the schema validation together, and we can give more details exception msg for different schemas.



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkTable.java:
##########
@@ -102,4 +108,9 @@ public <T extends SpecificRecordBase> Option<HoodieTableMetadataWriter> getMetad
       return Option.empty();
     }
   }
+
+  private static void setLatestInternalSchema(HoodieWriteConfig config, HoodieTableMetaClient metaClient) {
+    Option<InternalSchema> internalSchema = new TableSchemaResolver(metaClient).getTableInternalSchemaFromCommitMetadata();

Review Comment:
   Add pre-condition check in case of null values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033314251


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetReader.java:
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.table.format.cow.ParquetSplitReaderUtil;
+import org.apache.hudi.util.RowDataProjection;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.types.DataType;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * Base interface for hoodie parquet readers.
+ */
+public interface HoodieParquetReader extends Closeable {
+
+  boolean reachedEnd() throws IOException;
+
+  RowData nextRecord();
+
+  static HoodieParquetReader getReader(
+      InternalSchemaManager internalSchemaManager,
+      boolean utcTimestamp,
+      boolean caseSensitive,
+      Configuration conf,
+      String[] fieldNames,
+      DataType[] fieldTypes,
+      Map<String, Object> partitionSpec,
+      int[] selectedFields,
+      int batchSize,
+      Path path,
+      long splitStart,
+      long splitLength) throws IOException {
+    Option<RowDataProjection> castProjection;
+    InternalSchema fileSchema = internalSchemaManager.getFileSchema(path.getName());
+    if (fileSchema.isEmptySchema()) {
+      return new HoodieParquetSplitReader(
+          ParquetSplitReaderUtil.genPartColumnarRowReader(

Review Comment:
   You mean shared with another file split? I guess no because of `ParquetColumnarRowSplitReader` is not shareable. Currently, we always create new parquet reader for each file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1326200670

   > @danny0405 @xiarixiaoyao I reworked this PR. Could you pls take a look
   > 
   > * Reverted all changes in InternalSchema
   > * Reverted all changes in COWInputFormat and MORInputForms
   > * Added new tests with metafields query and count(*) query
   > * Introduced new interface for parquet reader `HoodieParquetReader`
   > * Implemented 2 readers: "reader as is" `HoodieParquetSplitReader` and "schema evolution reader" `HoodieParquetEvolvedSplitReader`
   > 
   > Thus, we follow the approach proposed above:
   > 
   > 1. fetch the original schema when the file was committed, read the record as is
   > 2. project the record with latest read schema if needed
   > 
   > Almost all schema evolution code is separated from inputFormat. The code is placed in `InternalSchemaManager` and `CastMap`.
   
   Would review it tomorrow, i see there is a conflict, can we resolve it first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314934901

   > What do you mean for the rest cases ? Isn't either schema evolution enabled or disabled ?
   
   Even though schema evolution enabled there is might not an `internalSchema`. So we use reader schema instead of writer schema. Example in spark [LogFileIterator.scala](https://github.com/apache/hudi/blob/6b0b03b12b5b35efd16eb976d48edba876803ca0/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/LogFileIterator.scala#L87)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022510833


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatReader.java:
##########
@@ -52,9 +53,9 @@ public class HoodieLogFormatReader implements HoodieLogFormat.Reader {
 
   private static final Logger LOG = LogManager.getLogger(HoodieLogFormatReader.class);
 
-  HoodieLogFormatReader(FileSystem fs, List<HoodieLogFile> logFiles, Schema readerSchema, boolean readBlocksLazily,

Review Comment:
   we can remove internalSchema directly.
   internralSchema is only used to indicate current hudi table has done schema evolution. and we should use writeSchema to read logblock.
   
   if schemaEvolution is disabled, let's pass readerSchema as null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
voonhous commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1360801005

   @trushev 
   
   Yes, this is what I intend to work on.  
   
   What you described is operations made entirely on FlinkSQL. I was thinking of cross-engine operations.
   
   i.e. Tables that were evolved using Avro Schema Resolution (ASR) via Spark, but read using Flink, the same error will be thrown for such cases too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1306552952

   Is there any progress in this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1018636538


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -91,6 +91,11 @@ public static InternalSchema convert(Schema schema) {
     return new InternalSchema(fields);
   }
 
+  /** Convert an avro schema into internalSchema with given versionId. */
+  public static InternalSchema convertToEmpty(Schema schema) {
+    return new InternalSchema(InternalSchema.EMPTY_SCHEMA_VERSION_ID, schema);

Review Comment:
   This is also confusing, an internal schema with 'empty' version id but still got avro schema internal, please clarify it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1185308852

   Ok, then I will fix the typo in commit message `HUDI-3983` => `HUDI-3981` along with comment fixes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927312625


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##########
@@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) {
       return this;
     }
 
+    public Builder withInternalSchema(InternalSchema internalSchema) {
+      this.internalSchema = internalSchema;
+      return this;

Review Comment:
   I used the second schema here to be consistent with `HoodieMergedLogRecordScanner` which already uses this approach to scan logs in `HoodieMergeOnReadRDD#scanLog`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927287333


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
       return null;
     }
   }
+
+  protected Iterator<GenericRecord> getRecordIterator(
+      HoodieTable<T, ?, ?, ?> table,
+      HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+      HoodieBaseFile baseFile,
+      HoodieFileReader<GenericRecord> reader,
+      Schema readSchema) throws IOException {
+    Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+    if (!querySchemaOpt.isPresent()) {
+      querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+    }
+    boolean needToReWriteRecord = false;
+    Map<String, String> renameCols = new HashMap<>();
+    // TODO support bootstrap
+    if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) {

Review Comment:
   I just moved this code snippet from `HoodieMergeHelper` to `BaseMergeHelper` as is. Anyway I will think about avoiding unnecessary checks you pointed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927365423


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -279,7 +279,7 @@ private void saveInternalSchema(HoodieTable table, String instantTime, HoodieCom
     FileBasedInternalSchemaStorageManager schemasManager = new FileBasedInternalSchemaStorageManager(table.getMetaClient());
     if (!historySchemaStr.isEmpty() || Boolean.parseBoolean(config.getString(HoodieCommonConfig.RECONCILE_SCHEMA.key()))) {
       InternalSchema internalSchema;
-      Schema avroSchema = HoodieAvroUtils.createHoodieWriteSchema(new Schema.Parser().parse(config.getSchema()));
+      Schema avroSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()), config.allowOperationMetadataField());
       if (historySchemaStr.isEmpty()) {

Review Comment:
   Does
   
   `HoodieAvroUtils.createHoodieWriteSchema(String schema, boolean withOperationField)`
   
   work here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1330080879

   CI build failure due to broken master branch. I've pushed the fix https://github.com/apache/hudi/pull/7319


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1330175541

   It looks like azure doesn't run on this PR anymore. Verifying PR is opened https://github.com/apache/hudi/pull/7321
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 merged PR #5830:
URL: https://github.com/apache/hudi/pull/5830


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1168481885

   @danny0405 @XuQianJin-Stars  could you pls help review this pr, thanks very much


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181245839

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ef13a2d832c21b69938c958e1e84e4667d0b402d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858) 
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181168133

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * faa859369ddf9c3724487eb5a028186d0a970154 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856) 
   * ef13a2d832c21b69938c958e1e84e4667d0b402d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022530081


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataMergedLogRecordReader.java:
##########
@@ -54,15 +54,15 @@ public class HoodieMetadataMergedLogRecordReader extends HoodieMergedLogRecordSc
 
   private HoodieMetadataMergedLogRecordReader(FileSystem fs, String basePath, String partitionName,
                                               List<String> logFilePaths,
-                                              Schema readerSchema, String latestInstantTime,
+                                              InternalSchema readerSchema, String latestInstantTime,
                                               Long maxMemorySizeInBytes, int bufferSize,
                                               String spillableMapBasePath,
                                               ExternalSpillableMap.DiskMapType diskMapType,
                                               boolean isBitCaskDiskMapCompressionEnabled,
                                               Option<InstantRange> instantRange, boolean allowFullScan, boolean useScanV2) {
     super(fs, basePath, logFilePaths, readerSchema, latestInstantTime, maxMemorySizeInBytes, true, false, bufferSize,
         spillableMapBasePath, instantRange, diskMapType, isBitCaskDiskMapCompressionEnabled, false, allowFullScan,
-            Option.of(partitionName), InternalSchema.getEmptyInternalSchema(), useScanV2);
+            Option.of(partitionName), useScanV2);

Review Comment:
   Agree, current view is to revert all changes not related to flink https://github.com/apache/hudi/pull/5830#issuecomment-1314919173



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864150


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##########
@@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) {
       return this;
     }
 
+    public Builder withInternalSchema(InternalSchema internalSchema) {
+      this.internalSchema = internalSchema;
+      return this;

Review Comment:
   I reverted changes in `HoodieMergedLogRecordScanner`. Now there is only one schema -- `InternalSchema` which wraps `org.apache.avro.Schema`. The same approach is used in `HoodieUnMergedLogRecordScanner`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864979


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   Fields `actualFieldNames`, `actualFieldTypes`, `actualSchema`, `querySchema` are removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002865594


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java:
##########
@@ -99,10 +113,36 @@ public CopyOnWriteInputFormat(
     this.selectedFields = selectedFields;
     this.conf = new SerializableConfiguration(conf);
     this.utcTimestamp = utcTimestamp;
+    this.schemaEvolutionContext = SchemaEvolutionContext.of(flinkConf);
   }
 
   @Override
   public void open(FileInputSplit fileSplit) throws IOException {
+    String[] actualFieldNames;
+    DataType[] actualFieldTypes;
+    if (schemaEvolutionContext.isPresent()) {
+      SchemaEvolutionContext context = schemaEvolutionContext.get();
+      InternalSchema actualSchema = context.getActualSchema(fileSplit);

Review Comment:
   Moved this logic to separate method `setActualFields`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1328419786

   Reviewing now ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033190639


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -120,6 +121,12 @@ private FlinkOptions() {
       .withDescription("The default partition name in case the dynamic partition"
           + " column value is null/empty string");
 
+  public static final ConfigOption<Boolean> SCHEMA_EVOLUTION_ENABLED = ConfigOptions
+      .key(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key())
+      .booleanType()

Review Comment:
   Replaced with deprecated `conf.getBoolean(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key(), false)`



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -120,6 +121,12 @@ private FlinkOptions() {
       .withDescription("The default partition name in case the dynamic partition"
           + " column value is null/empty string");
 
+  public static final ConfigOption<Boolean> SCHEMA_EVOLUTION_ENABLED = ConfigOptions
+      .key(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key())
+      .booleanType()

Review Comment:
   If there is no this ConfigOption we need to get the value from flink conf using deprecated 
   `conf.getBoolean(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key(), false)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033279063


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetReader.java:
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.table.format.cow.ParquetSplitReaderUtil;
+import org.apache.hudi.util.RowDataProjection;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.types.DataType;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * Base interface for hoodie parquet readers.
+ */
+public interface HoodieParquetReader extends Closeable {
+
+  boolean reachedEnd() throws IOException;
+
+  RowData nextRecord();
+
+  static HoodieParquetReader getReader(
+      InternalSchemaManager internalSchemaManager,
+      boolean utcTimestamp,
+      boolean caseSensitive,
+      Configuration conf,
+      String[] fieldNames,
+      DataType[] fieldTypes,
+      Map<String, Object> partitionSpec,
+      int[] selectedFields,
+      int batchSize,
+      Path path,
+      long splitStart,
+      long splitLength) throws IOException {
+    Option<RowDataProjection> castProjection;
+    InternalSchema fileSchema = internalSchemaManager.getFileSchema(path.getName());
+    if (fileSchema.isEmptySchema()) {
+      return new HoodieParquetSplitReader(
+          ParquetSplitReaderUtil.genPartColumnarRowReader(

Review Comment:
   The `HoodieParquetSplitReader` can be shared ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033296914


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetSplitReader.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader;
+
+import org.apache.flink.table.data.RowData;
+
+import java.io.IOException;
+
+/**
+ * Hoodie wrapper for flink parquet reader.
+ */
+public final class HoodieParquetSplitReader implements HoodieParquetReader {
+  private final ParquetColumnarRowSplitReader reader;
+
+  public HoodieParquetSplitReader(ParquetColumnarRowSplitReader reader) {
+    this.reader = reader;
+  }

Review Comment:
   `ParquetColumnarRowSplitReader` can implement `HoodieParquetReader` directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(R…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152496441

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 152d9abfe646e966dd40171a15fd5faa5e0a4594 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211) 
   * 06a66b2cc3450cd29b13b755976480317e134b4c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(R…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1151959011

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 152d9abfe646e966dd40171a15fd5faa5e0a4594 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1154719742

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 80d33308c8a4e290c0c7b66fff4023e8825f8163 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252) 
   * 5164437958be477aa84e5acc151cda008a8c8607 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179857893

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2dc34ce7581e1f0c631901ed9060837343220f2f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818) 
   * cb28b4f297e0e0b41ce9ea46b8be5002190e9f94 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179709180

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 197780acd4560103dbe846d7bf09bd50efa80066 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754) 
   * c71cb55bf081afc59b3c323e59e825a9e482e3c4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1180921734

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 239534226d5cf6cbef8ef1e8dc454daf3dacf20b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851) 
   * faa859369ddf9c3724487eb5a028186d0a970154 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927309714


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1667,16 +1667,14 @@ public void reOrderColPosition(String colName, String referColName, TableChange.
   private Pair<InternalSchema, HoodieTableMetaClient> getInternalSchemaAndMetaClient() {
     HoodieTableMetaClient metaClient = createMetaClient(true);
     TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient);
-    Option<InternalSchema> internalSchemaOption = schemaUtil.getTableInternalSchemaFromCommitMetadata();
-    if (!internalSchemaOption.isPresent()) {
-      throw new HoodieException(String.format("cannot find schema for current table: %s", config.getBasePath()));
-    }
-    return Pair.of(internalSchemaOption.get(), metaClient);

Review Comment:
   Ok, thanks for the review. I will think about decoupling schema evo from the other code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927375589


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -279,7 +279,7 @@ private void saveInternalSchema(HoodieTable table, String instantTime, HoodieCom
     FileBasedInternalSchemaStorageManager schemasManager = new FileBasedInternalSchemaStorageManager(table.getMetaClient());
     if (!historySchemaStr.isEmpty() || Boolean.parseBoolean(config.getString(HoodieCommonConfig.RECONCILE_SCHEMA.key()))) {
       InternalSchema internalSchema;
-      Schema avroSchema = HoodieAvroUtils.createHoodieWriteSchema(new Schema.Parser().parse(config.getSchema()));
+      Schema avroSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()), config.allowOperationMetadataField());
       if (historySchemaStr.isEmpty()) {

Review Comment:
   Yes it does. Fine I'll replace usage of `HoodieAvroUtils.addMetadataFields(Schema, boolean)`
   with `HoodieAvroUtils.createHoodieWriteSchema(String, boolean)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927284257


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   There is no correct schema at moment of contraction `MergeOnReadInputFormat`:
   baseFile1 with schema1 {id: int, value: int}
   baseFile2 with schema2 {id: int, value: long} -- read schema
   both files will be passed to `MergeOnReadInputFormat#open` as `MergeOnReadInputSplit`
   we need to read baseFile1 with schema1 then cast `value: int` to `value: long` using `CastMap` 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288409306

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1021080563


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatReader.java:
##########
@@ -141,4 +139,8 @@ public HoodieLogBlock prev() throws IOException {
     return this.currentReader.prev();
   }
 
+  private Schema getReaderSchema() {
+    boolean useWriterSchema = !readerSchema.isEmptySchema();
+    return useWriterSchema ? null : readerSchema.getAvroSchema();

Review Comment:
   I used variable here on purpose to point out `useWriterSchema`
   Essentially, it represents removed code
   https://github.com/apache/hudi/blob/release-0.12.1/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L162-L169 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033136057


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/InternalSchemaManager.java:
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.configuration.HadoopConfigurations;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Type;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.internal.schema.utils.InternalSchemaUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public class InternalSchemaManager implements Serializable {
+
+  private static final long serialVersionUID = 1L;
+
+  public static final InternalSchemaManager DISABLED = new InternalSchemaManager(null, null, null, null);
+
+  private final Configuration conf;
+  private final InternalSchema querySchema;
+  private final String validCommits;
+  private final String tablePath;
+  private transient org.apache.hadoop.conf.Configuration hadoopConf;
+
+  public static InternalSchemaManager get(Configuration conf, HoodieTableMetaClient metaClient) {
+    if (!conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      return DISABLED;
+    }
+    Option<InternalSchema> internalSchema = new TableSchemaResolver(metaClient).getTableInternalSchemaFromCommitMetadata();
+    if (!internalSchema.isPresent() || internalSchema.get().isEmptySchema()) {
+      return DISABLED;
+    }
+    String validCommits = metaClient
+        .getCommitsAndCompactionTimeline()
+        .filterCompletedInstants()
+        .getInstants()
+        .map(HoodieInstant::getFileName)
+        .collect(Collectors.joining(","));
+    return new InternalSchemaManager(conf, internalSchema.get(), validCommits, metaClient.getBasePathV2().toString());
+  }
+
+  public InternalSchemaManager(Configuration conf, InternalSchema querySchema, String validCommits, String tablePath) {
+    this.conf = conf;
+    this.querySchema = querySchema;
+    this.validCommits = validCommits;
+    this.tablePath = tablePath;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema != null ? querySchema : InternalSchema.getEmptyInternalSchema();
+  }
+
+  InternalSchema getFileSchema(String fileName) {
+    InternalSchema querySchema = getQuerySchema();
+    if (querySchema.isEmptySchema()) {
+      return InternalSchema.getEmptyInternalSchema();
+    }
+    long commitInstantTime = Long.parseLong(FSUtils.getCommitTime(fileName));
+    InternalSchema fileSchemaUnmerged = InternalSchemaCache.getInternalSchemaByVersionId(
+        commitInstantTime, tablePath, getHadoopConf(), validCommits);
+    if (querySchema.equals(fileSchemaUnmerged)) {
+      return InternalSchema.getEmptyInternalSchema();
+    }
+    return new InternalSchemaMerger(fileSchemaUnmerged, querySchema, true, true).mergeSchema();
+  }
+
+  CastMap getCastMap(InternalSchema fileSchema, String[] queryFieldNames, DataType[] queryFieldTypes, int[] selectedFields) {
+    assertSchemasAreNotEmpty(getQuerySchema(), fileSchema);
+
+    CastMap castMap = new CastMap();
+    Map<Integer, Integer> posProxy = getPosProxy(fileSchema, queryFieldNames);
+    if (posProxy.isEmpty()) {
+      castMap.setFileFieldTypes(queryFieldTypes);
+      return castMap;
+    }
+    List<Integer> selectedFieldList = IntStream.of(selectedFields).boxed().collect(Collectors.toList());
+    List<DataType> fileSchemaAsDataTypes = AvroSchemaConverter.convertToDataType(
+        AvroInternalSchemaConverter.convert(fileSchema, "tableName")).getChildren();
+    DataType[] fileFieldTypes = new DataType[queryFieldTypes.length];
+    for (int i = 0; i < queryFieldTypes.length; i++) {
+      Integer posOfChangedType = posProxy.get(i);
+      if (posOfChangedType == null) {
+        fileFieldTypes[i] = queryFieldTypes[i];
+      } else {
+        DataType fileType = fileSchemaAsDataTypes.get(posOfChangedType);
+        fileFieldTypes[i] = fileType;
+        int selectedPos = selectedFieldList.indexOf(i);
+        if (selectedPos != -1) {
+          castMap.add(selectedPos, fileType.getLogicalType(), queryFieldTypes[i].getLogicalType());
+        }
+      }
+    }
+    castMap.setFileFieldTypes(fileFieldTypes);
+    return castMap;
+  }
+
+  String[] getFileFieldNames(InternalSchema fileSchema, String[] queryFieldNames) {
+    assertSchemasAreNotEmpty(getQuerySchema(), fileSchema);
+
+    Map<String, String> renamedCols = InternalSchemaUtils.collectRenameCols(fileSchema, getQuerySchema());
+    if (renamedCols.isEmpty()) {
+      return queryFieldNames;
+    }
+    return Arrays.stream(queryFieldNames).map(name -> renamedCols.getOrDefault(name, name)).toArray(String[]::new);
+  }
+
+  private Map<Integer, Integer> getPosProxy(InternalSchema fileSchema, String[] queryFieldNames) {
+    Map<Integer, Pair<Type, Type>> changedCols = InternalSchemaUtils.collectTypeChangedCols(getQuerySchema(), fileSchema);
+    HashMap<Integer, Integer> posProxy = new HashMap<>(changedCols.size());
+    List<String> fieldNameList = Arrays.asList(queryFieldNames);
+    List<Types.Field> columns = getQuerySchema().columns();
+    changedCols.forEach((posInSchema, typePair) -> {
+      String name = columns.get(posInSchema).name();
+      int posInType = fieldNameList.indexOf(name);
+      posProxy.put(posInType, posInSchema);
+    });
+    return Collections.unmodifiableMap(posProxy);
+  }
+
+  private org.apache.hadoop.conf.Configuration getHadoopConf() {
+    if (hadoopConf == null) {
+      hadoopConf = HadoopConfigurations.getHadoopConf(conf);
+    }
+    return hadoopConf;
+  }
+
+  private static void assertSchemasAreNotEmpty(InternalSchema schema1, InternalSchema schema2) {
+    Preconditions.checkArgument(!schema1.isEmptySchema(), "InternalSchema cannot be empty here");

Review Comment:
   removed method, replaced message "InternalSchema..." with "querySchema..."



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/RowDataProjection.java:
##########
@@ -87,4 +88,8 @@ public Object[] projectAsValues(RowData rowData) {
     }
     return values;
   }
+
+  protected @Nullable Object rewriteVal(int pos, @Nullable Object val) {
+    return val;

Review Comment:
   renamed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033136129


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/RowDataProjection.java:
##########
@@ -87,4 +88,8 @@ public Object[] projectAsValues(RowData rowData) {
     }
     return values;
   }
+
+  protected @Nullable Object rewriteVal(int pos, @Nullable Object val) {
+    return val;

Review Comment:
   renamed rewriteVal => getVal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1331838969

   @danny0405 @xiarixiaoyao @flashJd thank you for the review this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314822778

   > Moreover, two schema approach still keeps coming in PRs
   > https://github.com/apache/hudi/pull/7187/files
   
   Isn't this is a prove that two schema may case bug in corner cases ? Personally, i prefer we keep only one reader schema interface here. Either we have some tool for fetching the right avro schema in evolution use cases or we keep only the internal schema that is compatible for evolution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314818826

   > > [3981.patch.zip](https://github.com/apache/hudi/files/9977268/3981.patch.zip) Thanks for the contribution, have reviewed some of the part, and left a local patch here and some comments ~
   > 
   > Why we use writerSchema instead of readerSchema when schema evolution disabled? 3981.patch:
   > 
   > ```java
   > public Schema getAvroSchema() {
   >   if (isEmptySchema()) {
   >     return null;
   >   }
   > ```
   
   Actually i'm confused totally by these schema use cases, can we list a summary here, in which case we use writer /reader schema, for schema evolution enabled/disabled ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314898583

   > Actually i'm confused totally by these schema use cases, can we list a summary here, in which case we use writer /reader schema, for schema evolution enabled/disabled ?
   
   Schema evolution **disabled**:
   Reader schema example [FormatUtils](https://github.com/apache/hudi/blob/6b0b03b12b5b35efd16eb976d48edba876803ca0/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FormatUtils.java#L173-L189)
   Writer schema example [HoodieLogCompactionPlanGenerator](https://github.com/apache/hudi/blob/6b0b03b12b5b35efd16eb976d48edba876803ca0/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/HoodieLogCompactionPlanGenerator.java#L85-L95)
   
   Schema evolution **enabled**:
   Writer schema is used in all previous cases(as schema evolution disabled), as well as we can find appropriate `internalSchema`. For example, [HoodieCompactor](https://github.com/apache/hudi/blob/6b0b03b12b5b35efd16eb976d48edba876803ca0/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java#L163-L204). **Why we use writer schema here?** The main idea is to read log block as is. Then we "cast" log block using `HoodieAvroUtils.rewriteRecordWithNewSchema` here [AbstractHoodieLogRecordReader](https://github.com/apache/hudi/blob/6b0b03b12b5b35efd16eb976d48edba876803ca0/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L634)
   
   Reader schema: the rest cases
   
   
   > Isn't this is a prove that two schema may case bug in corner cases ?
   
   Agree. Mb we should create another `AbstractHoodieEvolveLogReader` with `InternalSchema` only.
   It looks like a lot of changes not related to flink but the core of schema evolution and the other engines in hudi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1316191271

   @trushev @danny0405 @xiarixiaoyao Thank you folks for pushing the schema evolution support in Flink?  Do you guys think we can merge this before 0.13.0 code freeze (Dec 12)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1001518524


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -166,29 +177,56 @@ public void open(MergeOnReadInputSplit split) throws IOException {
     this.currentReadCount = 0L;
     this.closed = false;
     this.hadoopConf = HadoopConfigurations.getHadoopConf(this.conf);
+    if (schemaEvolutionContext.isPresent()) {
+      SchemaEvolutionContext context = schemaEvolutionContext.get();
+      querySchema = context.getQuerySchema();
+      actualSchema = context.getActualSchema(split);
+      actualFieldNames = context.getFieldNames(actualSchema);
+      actualFieldTypes = context.getFieldTypes(actualSchema);
+    } else {
+      querySchema = InternalSchema.getEmptyInternalSchema();
+      actualSchema = InternalSchema.getEmptyInternalSchema();
+      actualFieldNames = fieldNames;
+      actualFieldTypes = fieldTypes;
+    }
+
     if (!(split.getLogPaths().isPresent() && split.getLogPaths().get().size() > 0)) {
-      if (split.getInstantRange() != null) {
+      if (split.getInstantRange().isPresent()) {
         // base file only with commit time filtering
         this.iterator = new BaseFileOnlyFilteringIterator(
-            split.getInstantRange(),
-            this.tableState.getRequiredRowType(),
+            split.getInstantRange().get(),
             getReader(split.getBasePath().get(), getRequiredPosWithCommitTime(this.requiredPos)));
+        int[] positions = IntStream.range(1, requiredPos.length + 1).toArray();
+        RowDataProjection projection = getCastProjection(positions)
+            .orElse(RowDataProjection.instance(tableState.getRequiredRowType(), positions));
+        projectRecordIterator(projection);
       } else {
         // base file only
         this.iterator = new BaseFileOnlyIterator(getRequiredSchemaReader(split.getBasePath().get()));
+        projectRecordIterator();
       }
     } else if (!split.getBasePath().isPresent()) {
       // log files only
       if (OptionsResolver.emitChangelog(conf)) {
         this.iterator = new LogFileOnlyIterator(getUnMergedLogFileIterator(split));
+        projectRecordIterator();
       } else {
         this.iterator = new LogFileOnlyIterator(getLogFileIterator(split));
+        projectRecordIterator();

Review Comment:
   You are right, fixed



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Type;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.internal.schema.utils.InternalSchemaUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.LogicalTypeRoot;
+import org.apache.flink.util.Preconditions;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.time.LocalDate;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.BIGINT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DATE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DECIMAL;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DOUBLE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.FLOAT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.INTEGER;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.VARCHAR;
+
+/**
+ * CastMap is responsible for conversion of flink types when full schema evolution enabled.
+ */
+public final class CastMap implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  // Maps position to corresponding cast
+  private final Map<Integer, Cast> castMap = new HashMap<>();
+
+  /**
+   * Creates CastMap by comparing two schemes. Cast of a specific column is created if its type has changed.
+   */
+  public static CastMap of(String tableName, InternalSchema querySchema, InternalSchema actualSchema) {
+    DataType queryType = internalSchemaToDataType(tableName, querySchema);
+    DataType actualType = internalSchemaToDataType(tableName, actualSchema);
+    CastMap castMap = new CastMap();
+    InternalSchemaUtils.collectTypeChangedCols(querySchema, actualSchema).entrySet().stream()
+        .filter(e -> !isSameType(e.getValue().getLeft(), e.getValue().getRight()))

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179927407

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cb28b4f297e0e0b41ce9ea46b8be5002190e9f94 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(R…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1151956546

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 152d9abfe646e966dd40171a15fd5faa5e0a4594 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152595265

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06a66b2cc3450cd29b13b755976480317e134b4c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218) 
   * 7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1153571701

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220) 
   * 80d33308c8a4e290c0c7b66fff4023e8825f8163 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r925314390


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -279,7 +279,7 @@ private void saveInternalSchema(HoodieTable table, String instantTime, HoodieCom
     FileBasedInternalSchemaStorageManager schemasManager = new FileBasedInternalSchemaStorageManager(table.getMetaClient());
     if (!historySchemaStr.isEmpty() || Boolean.parseBoolean(config.getString(HoodieCommonConfig.RECONCILE_SCHEMA.key()))) {
       InternalSchema internalSchema;
-      Schema avroSchema = HoodieAvroUtils.createHoodieWriteSchema(new Schema.Parser().parse(config.getSchema()));
+      Schema avroSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()), config.allowOperationMetadataField());
       if (historySchemaStr.isEmpty()) {

Review Comment:
   Why not add a new method: HoodieAvroUtils#createHoodieWriteSchema(HoodieWriteConfig)



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
       return null;
     }
   }
+
+  protected Iterator<GenericRecord> getRecordIterator(
+      HoodieTable<T, ?, ?, ?> table,
+      HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+      HoodieBaseFile baseFile,
+      HoodieFileReader<GenericRecord> reader,
+      Schema readSchema) throws IOException {
+    Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+    if (!querySchemaOpt.isPresent()) {
+      querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+    }
+    boolean needToReWriteRecord = false;
+    Map<String, String> renameCols = new HashMap<>();
+    // TODO support bootstrap
+    if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) {

Review Comment:
   Do we need to check the schema evolution for each file ? or each read/commit ?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FormatUtils.java:
##########
@@ -130,6 +132,7 @@ public static HoodieMergedLogRecordScanner logScanner(
         .withBasePath(split.getTablePath())
         .withLogFilePaths(split.getLogPaths().get())
         .withReaderSchema(logSchema)
+        .withInternalSchema(internalSchema)
         .withLatestInstantTime(split.getLatestCommit())

Review Comment:
   Just pass around a correct readSchema is enough ~



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   Same rule, just make the schema correct before constructing the `MergeOnReadInputFormat`, and let's not impose the components of schema evolution in any format/reader.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##########
@@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) {
       return this;
     }
 
+    public Builder withInternalSchema(InternalSchema internalSchema) {
+      this.internalSchema = internalSchema;
+      return this;

Review Comment:
   There is already a read schema, why we pass around another schema, whatever it is, please use just one schema !



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -453,6 +453,7 @@ private MergeOnReadInputFormat mergeOnReadInputFormat(
         this.requiredPos,
         this.conf.getString(FlinkOptions.PARTITION_DEFAULT_NAME),
         this.limit == NO_LIMIT_CONSTANT ? Long.MAX_VALUE : this.limit, // ParquetInputFormat always uses the limit value
+        this.conf,
         getParquetConf(this.conf, this.hadoopConf),

Review Comment:
   Avoid to pass around huge object like `conf`.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java:
##########
@@ -99,10 +113,36 @@ public CopyOnWriteInputFormat(
     this.selectedFields = selectedFields;
     this.conf = new SerializableConfiguration(conf);
     this.utcTimestamp = utcTimestamp;
+    this.schemaEvolutionContext = SchemaEvolutionContext.of(flinkConf);
   }
 
   @Override
   public void open(FileInputSplit fileSplit) throws IOException {
+    String[] actualFieldNames;
+    DataType[] actualFieldTypes;
+    if (schemaEvolutionContext.isPresent()) {
+      SchemaEvolutionContext context = schemaEvolutionContext.get();
+      InternalSchema actualSchema = context.getActualSchema(fileSplit);

Review Comment:
   Just make the `fullFieldNames` and `fullFieldTypes` as ones after schema evolution, and move these schema evolution components/logic to separate tool clazz.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Type;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.internal.schema.utils.InternalSchemaUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.LogicalTypeRoot;
+import org.apache.flink.util.Preconditions;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.time.LocalDate;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.BIGINT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DATE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DECIMAL;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DOUBLE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.FLOAT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.INTEGER;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.VARCHAR;
+
+/**
+ * CastMap is responsible for conversion of flink types when full schema evolution enabled.
+ */

Review Comment:
   Add some document about the general rules for schema evolution ~



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class SchemaEvolutionContext implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  public static Option<SchemaEvolutionContext> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new SchemaEvolutionContext(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  public SchemaEvolutionContext(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    String commitTime = split.getBasePath()
+        .map(FSUtils::getCommitTime)
+        .orElse(split.getLatestCommit());
+    return getActualSchema(commitTime);
+  }
+
+  public List<String> getFieldNames(InternalSchema internalSchema) {
+    return internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList());
+  }
+
+  public List<DataType> getFieldTypes(InternalSchema internalSchema) {
+    return AvroSchemaConverter.convertToDataType(

Review Comment:
   Can we give some explanations for these methods ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] flashJd commented on a diff in pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
flashJd commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r978244007


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/ITTestSchemaEvolution.java:
##########
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.sink;
+
+import org.apache.hudi.client.HoodieFlinkWriteClient;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.keygen.ComplexAvroKeyGenerator;
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
+import org.apache.hudi.table.HoodieTableFactory;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.TableResult;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.factories.FactoryUtil;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+import org.apache.flink.util.Preconditions;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.internal.schema.action.TableChange.ColumnPositionChange.ColumnPositionType.AFTER;
+import static org.apache.hudi.utils.TestConfigurations.ROW_TYPE;
+import static org.apache.hudi.utils.TestConfigurations.ROW_TYPE_EVOLUTION;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class ITTestSchemaEvolution extends AbstractTestBase {
+  @TempDir File tempFile;
+  StreamExecutionEnvironment env;
+  StreamTableEnvironment tEnv;
+
+  String[] expectedMergedResult = new String[] {
+      "+I[Danny, 10000.1, 23]",
+      "+I[Stephen, null, 33]",
+      "+I[Julian, 30000.3, 53]",
+      "+I[Fabian, null, 31]",
+      "+I[Sophia, null, 18]",
+      "+I[Emma, null, 20]",
+      "+I[Bob, null, 44]",
+      "+I[Han, null, 56]",
+      "+I[Alice, 90000.9, unknown]"
+  };
+
+  String[] expectedUnMergedResult = new String[] {
+      "+I[Danny, null, 23]",
+      "+I[Stephen, null, 33]",
+      "+I[Julian, null, 53]",
+      "+I[Fabian, null, 31]",
+      "+I[Sophia, null, 18]",
+      "+I[Emma, null, 20]",
+      "+I[Bob, null, 44]",
+      "+I[Han, null, 56]",
+      "+I[Alice, 90000.9, unknown]",
+      "+I[Danny, 10000.1, 23]",
+      "+I[Julian, 30000.3, 53]"
+  };
+
+  @BeforeEach
+  public void setUp() {
+    env = StreamExecutionEnvironment.getExecutionEnvironment();
+    env.setParallelism(1);
+    tEnv = StreamTableEnvironment.create(env);
+  }
+
+  @Test
+  public void testCopyOnWriteInputFormat() throws Exception {
+    testRead(defaultOptionMap(tempFile.getAbsolutePath()));
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatBaseFileOnlyIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.READ_AS_STREAMING.key(), true);
+    optionMap.put(FlinkOptions.READ_START_COMMIT.key(), FlinkOptions.START_COMMIT_EARLIEST);
+    testRead(optionMap);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatBaseFileOnlyFilteringIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.READ_AS_STREAMING.key(), true);
+    optionMap.put(FlinkOptions.READ_START_COMMIT.key(), 1);
+    testRead(optionMap);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatLogFileOnlyIteratorGetLogFileIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    testRead(optionMap);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatLogFileOnlyIteratorGetUnMergedLogFileIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.READ_AS_STREAMING.key(), true);
+    optionMap.put(FlinkOptions.READ_START_COMMIT.key(), FlinkOptions.START_COMMIT_EARLIEST);
+    optionMap.put(FlinkOptions.CHANGELOG_ENABLED.key(), true);
+    testRead(optionMap, expectedUnMergedResult);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatMergeIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 1);
+    testRead(optionMap, true);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatSkipMergeIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 1);
+    optionMap.put(FlinkOptions.MERGE_TYPE.key(), FlinkOptions.REALTIME_SKIP_MERGE);
+    testRead(optionMap, true, expectedUnMergedResult);
+  }
+
+  @SuppressWarnings({"SqlDialectInspection", "SqlNoDataSourceInspection"})
+  @Test
+  public void testCompaction() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 1);
+    testRead(optionMap, new String[0]);

Review Comment:
   why use new String[0], it confused me as if no result when read.
   L293:i < expected.size() && iterator.hasNext() makes it passed but makes no sense



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Type;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.internal.schema.utils.InternalSchemaUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.LogicalTypeRoot;
+import org.apache.flink.util.Preconditions;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.time.LocalDate;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.BIGINT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DATE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DECIMAL;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DOUBLE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.FLOAT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.INTEGER;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.VARCHAR;
+
+/**
+ * CastMap is responsible for conversion of flink types when full schema evolution enabled.
+ */
+public final class CastMap implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  // Maps position to corresponding cast
+  private final Map<Integer, Cast> castMap = new HashMap<>();
+
+  /**
+   * Creates CastMap by comparing two schemes. Cast of a specific column is created if its type has changed.
+   */
+  public static CastMap of(String tableName, InternalSchema querySchema, InternalSchema actualSchema) {
+    DataType queryType = internalSchemaToDataType(tableName, querySchema);
+    DataType actualType = internalSchemaToDataType(tableName, actualSchema);
+    CastMap castMap = new CastMap();
+    InternalSchemaUtils.collectTypeChangedCols(querySchema, actualSchema).entrySet().stream()
+        .filter(e -> !isSameType(e.getValue().getLeft(), e.getValue().getRight()))

Review Comment:
   It's better to move `isSameType()`  into collectTypeChangedCols.
   Replace https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java#L212



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class SchemaEvolutionContext implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  public static Option<SchemaEvolutionContext> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new SchemaEvolutionContext(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  public SchemaEvolutionContext(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    String commitTime = split.getBasePath()
+        .map(FSUtils::getCommitTime)

Review Comment:
   getCommitTime need a fullFileName input, but we pass the full basePath, if basePath contains '_',  it'll parse incorrectly



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -166,29 +177,56 @@ public void open(MergeOnReadInputSplit split) throws IOException {
     this.currentReadCount = 0L;
     this.closed = false;
     this.hadoopConf = HadoopConfigurations.getHadoopConf(this.conf);
+    if (schemaEvolutionContext.isPresent()) {
+      SchemaEvolutionContext context = schemaEvolutionContext.get();
+      querySchema = context.getQuerySchema();
+      actualSchema = context.getActualSchema(split);
+      actualFieldNames = context.getFieldNames(actualSchema);
+      actualFieldTypes = context.getFieldTypes(actualSchema);
+    } else {
+      querySchema = InternalSchema.getEmptyInternalSchema();
+      actualSchema = InternalSchema.getEmptyInternalSchema();
+      actualFieldNames = fieldNames;
+      actualFieldTypes = fieldTypes;
+    }
+
     if (!(split.getLogPaths().isPresent() && split.getLogPaths().get().size() > 0)) {
-      if (split.getInstantRange() != null) {
+      if (split.getInstantRange().isPresent()) {
         // base file only with commit time filtering
         this.iterator = new BaseFileOnlyFilteringIterator(
-            split.getInstantRange(),
-            this.tableState.getRequiredRowType(),
+            split.getInstantRange().get(),
             getReader(split.getBasePath().get(), getRequiredPosWithCommitTime(this.requiredPos)));
+        int[] positions = IntStream.range(1, requiredPos.length + 1).toArray();
+        RowDataProjection projection = getCastProjection(positions)
+            .orElse(RowDataProjection.instance(tableState.getRequiredRowType(), positions));
+        projectRecordIterator(projection);
       } else {
         // base file only
         this.iterator = new BaseFileOnlyIterator(getRequiredSchemaReader(split.getBasePath().get()));
+        projectRecordIterator();
       }
     } else if (!split.getBasePath().isPresent()) {
       // log files only
       if (OptionsResolver.emitChangelog(conf)) {
         this.iterator = new LogFileOnlyIterator(getUnMergedLogFileIterator(split));
+        projectRecordIterator();
       } else {
         this.iterator = new LogFileOnlyIterator(getLogFileIterator(split));
+        projectRecordIterator();

Review Comment:
   No need to call projectRecordIterator() as the record has been rewriten in  https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L389



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1330057955

   > Can you scrash and force push here. I didn't see the Azure CI history, let's re-trigger it.
   
   Done, waiting for azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1331802428

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1329995935",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5b7eed269294cb9be8f0875517b062e45e7ddb84",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13340",
       "triggerID" : "5b7eed269294cb9be8f0875517b062e45e7ddb84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5b7eed269294cb9be8f0875517b062e45e7ddb84 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13340) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288410026

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927306341


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -279,7 +279,7 @@ private void saveInternalSchema(HoodieTable table, String instantTime, HoodieCom
     FileBasedInternalSchemaStorageManager schemasManager = new FileBasedInternalSchemaStorageManager(table.getMetaClient());
     if (!historySchemaStr.isEmpty() || Boolean.parseBoolean(config.getString(HoodieCommonConfig.RECONCILE_SCHEMA.key()))) {
       InternalSchema internalSchema;
-      Schema avroSchema = HoodieAvroUtils.createHoodieWriteSchema(new Schema.Parser().parse(config.getSchema()));
+      Schema avroSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()), config.allowOperationMetadataField());
       if (historySchemaStr.isEmpty()) {

Review Comment:
   1) Because `HoodieAvroUtils` from `hudi-common` while `HoodieWriteConfig` from `hudi-client-common` which depends on the former
   2) And workaround (1) using `HoodieConfig` instead of `HoodieWriteConfig` does not make sense because there is no `getSchema()` in `HoodieConfig`. There is no even `AVRO_SCHEMA_STRING` for `HoodieConfig.getString(AVRO_SCHEMA_STRING)`. I think raw string `hoodie.avro.schema` is an inappropriate approach



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1168479396

   Partial review. Still looking  @trushev Overall, looks good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1172032707

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 88ce744bc98ae26b81b00276a5e289c435188889 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181165434

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * faa859369ddf9c3724487eb5a028186d0a970154 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856) 
   * ef13a2d832c21b69938c958e1e84e4667d0b402d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1306553527

   Is there any progress in this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1326039940

   @danny0405 @xiarixiaoyao I rework this PR. Could you pls take a look
   
   - Reverted all changes in COWInputFormat and MORInputForms
   - Added new tests with metafields query and count(*) query
   - Introduced new interface for parquet reader `HoodieParquetReader`
   - Implemented 2 readers: "reader as is" `HoodieParquetSplitReader` and "schema evolution reader" `HoodieParquetEvolvedSplitReader`
   
   Thus, we follow the approach proposed above:
   1) fetch the original schema when the file was committed, read the record as is
   2) project the record with latest read schema if needed
   
   Almost all schema evolution code is separated from inputFormat. The code is placed in `InternalSchemaManager` and  `CastMap`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1326479266

   Rebased on the latest master. Locally verified flink tests passes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1313166445

   > [3981.patch.zip](https://github.com/apache/hudi/files/9977268/3981.patch.zip) Thanks for the contribution, have reviewed some of the part, and left a local patch here and some comments ~
   
   Why we use writerSchema instead of readerSchema when schema evolution disabled?
   3981.patch:
   ```java
   public Schema getAvroSchema() {
     if (isEmptySchema()) {
       return null;
     }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033285164


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetSplitReader.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader;
+
+import org.apache.flink.table.data.RowData;
+
+import java.io.IOException;
+
+/**
+ * Hoodie wrapper for flink parquet reader.
+ */
+public final class HoodieParquetSplitReader implements HoodieParquetReader {
+  private final ParquetColumnarRowSplitReader reader;
+
+  public HoodieParquetSplitReader(ParquetColumnarRowSplitReader reader) {

Review Comment:
   Can `ParquetColumnarRowSplitReader` implement `HoodieParquetReader` directly ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033203642


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   the following example throws exception:
   ```java
   CastMap castMap = new CastMap();
   castMap.add(0, new BigIntType(), new IntType()); // <---- error, cast long to int is unsupported
   ```
   ```
   java.lang.IllegalArgumentException: Cannot create cast BIGINT => INT at pos 0
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033179774


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetReader.java:
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.table.format.cow.ParquetSplitReaderUtil;
+import org.apache.hudi.util.RowDataProjection;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.types.DataType;
+
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * Base interface for hoodie parquet readers.
+ */
+public interface HoodieParquetReader extends Closeable {
+
+  boolean reachedEnd() throws IOException;
+
+  RowData nextRecord();
+
+  static HoodieParquetReader getReader(
+      InternalSchemaManager internalSchemaManager,
+      boolean utcTimestamp,
+      boolean caseSensitive,
+      Configuration conf,
+      String[] fieldNames,
+      DataType[] fieldTypes,
+      Map<String, Object> partitionSpec,
+      int[] selectedFields,
+      int batchSize,
+      Path path,
+      long splitStart,
+      long splitLength) throws IOException {
+    Option<RowDataProjection> castProjection;
+    InternalSchema fileSchema = internalSchemaManager.getFileSchema(path.getName());
+    if (fileSchema.isEmptySchema()) {
+      castProjection = Option.empty();

Review Comment:
   copy-pasted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1316583410

   Is the PR ready to review now ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1317964637

   Thanks, i go over the code quickly and find that there are 2 step that the input format adapter to schema evolution:
   
   1. fetch the original schema when the file was committed, read the record as is
   2. project the record with latest read schema if needed
   
   I would suggest we only do 1 for all kinds of input formats, and wrap another format only for schema evolution by doing step2, WDYT ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1025146256


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -286,4 +286,18 @@ public static Map<String, String> collectRenameCols(InternalSchema oldSchema, In
       return e.substring(lastDotIndex == -1 ? 0 : lastDotIndex + 1);
     }));
   }
+
+  /**
+   * Returns whether passed types are the same.
+   *
+   * @param t1 first type
+   * @param t2 second type
+   * @return true if types are the same
+   */
+  public static boolean isSameType(Type t1, Type t2) {
+    if (t1 instanceof Types.DecimalType && t2 instanceof Types.DecimalType) {
+      return t1.equals(t2);

Review Comment:
   Fixed https://github.com/apache/hudi/pull/7228



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1354714159

   @voonhous 
   As I understand you are talking about this case
   ```sql
   Flink SQL>
   
   -- write with schema1
   create table tbl(`id` int primary key, `value` int)
       partitioned by (`id`)
       with ('connector'='hudi', 'path'='/tmp/tbl');
   insert into tbl values (1, 10);
   drop table tbl;
   
   -- write with schema2 int => double
   create table tbl(`id` int primary key, `value` double)
       partitioned by (`id`)
       with ('connector'='hudi', 'path'='/tmp/tbl');
   insert into tbl values (2, 20.0);
   
   -- read all data
   select * from tbl; -- throws exception due to tbl consists of two partitioned files (1, 10) and (2, 20.0)
   ```
   
   ```java
   Caused by: java.lang.IllegalArgumentException: Unexpected type: INT32
   ```
   
   While if we delete `partitioned by ('id')` in sql above, tbl will consist of two unpartitioned files  (1, 10.0), (2, 20.0) and read query will work fine
   ```sql
   select * from tbl;
   ```
   ```
   +----+-------------+--------------------------------+
   | op |          id |                          value |
   +----+-------------+--------------------------------+
   | +I |           1 |                           10.0 |
   | +I |           2 |                           20.0 |
   +----+-------------+--------------------------------+
   ```
   
   In my opinion it's a good idea to support such scenario you described for spark https://github.com/apache/hudi/pull/7480
   Currently, I have no plans to implement it so you can do it if you wish


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1316584794

   @danny0405 @xiarixiaoyao I fixed comments above. Could you pls take a look again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1318024566

   > > is it possible to fetch the original schema when the file was committed if SE disabled?
   > 
   > There is no need to fetch the original schema if SE is disabled.
   > 
   > > Prepare int[] selectedFields according to actual schema (if SE enabled)
   > 
   > This should be quite easy to do.
   > 
   > > but the solution is more complicated than this PR
   > 
   > I have different thoughts and think this makes the input formats code more clean (towards immutable schema or on-read schema) and maintainable.
   
   Ok, thanks, I'll rework the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022515240


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
       return null;
     }
   }
+
+  protected Iterator<GenericRecord> getRecordIterator(
+      HoodieTable<T, ?, ?, ?> table,
+      HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+      HoodieBaseFile baseFile,
+      HoodieFileReader<GenericRecord> reader,
+      Schema readSchema) throws IOException {
+    Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+    if (!querySchemaOpt.isPresent()) {
+      querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+    }
+    boolean needToReWriteRecord = false;
+    Map<String, String> renameCols = new HashMap<>();
+    // TODO support bootstrap
+    if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) {

Review Comment:
   > @trushev can we avoid moved this code snippet, i donnot think flink evolution need to modify those codes. #6358 and #7183 will optimize this code
   
   @xiarixiaoyao This code should be moved from `HoodieMergeHelper` to `BaseMergeHelper` due to current class hierarchy:
   <img width="439" src="https://user-images.githubusercontent.com/42293632/201876103-6e59834e-ad85-4b22-9de4-257e26cdfd88.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152589494

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06a66b2cc3450cd29b13b755976480317e134b4c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(R…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152107876

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 152d9abfe646e966dd40171a15fd5faa5e0a4594 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1153586878

   If it is ready for reviewing, you can ping someone for help :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1177064328

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 88ce744bc98ae26b81b00276a5e289c435188889 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663) 
   * 197780acd4560103dbe846d7bf09bd50efa80066 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927284257


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   There is no correct schema at moment of contraction `MergeOnReadInputFormat`:
   baseFile1 with schema1 {id: int, value: int}
   baseFile2 with schema2 {id: int, value: long} -- read schema
   both files will be passed to the same `MergeOnReadInputFormat#open` as `MergeOnReadInputSplit`
   we need to read baseFile1 with schema1 then cast `value: int` to `value: long` using `CastMap` 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927312625


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##########
@@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) {
       return this;
     }
 
+    public Builder withInternalSchema(InternalSchema internalSchema) {
+      this.internalSchema = internalSchema;
+      return this;

Review Comment:
   I used the second schema here to be consistent with `HoodieMergedLogRecordScanner` which already use this approach to scan logs in `HoodieMergeOnReadRDD#scanLog`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927388244


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   Sorry I don't get what do you mean by schema metadata that tells us the schema
   My point is that to read baseFile1 we need to use `DataType[] fullFieldTypes` in `ParquetSplitReaderUtil.genPartColumnarRowReader` according to schema1
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(R…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152492876

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 152d9abfe646e966dd40171a15fd5faa5e0a4594 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211) 
   * 06a66b2cc3450cd29b13b755976480317e134b4c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022275500


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -91,6 +91,11 @@ public static InternalSchema convert(Schema schema) {
     return new InternalSchema(fields);
   }
 
+  /** Convert an avro schema into internalSchema with given versionId. */
+  public static InternalSchema convertToEmpty(Schema schema) {
+    return new InternalSchema(InternalSchema.EMPTY_SCHEMA_VERSION_ID, schema);

Review Comment:
   avro schema with empty version id represents disabled schema evolution
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022522069


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -491,7 +492,7 @@ public Pair<HoodieMetadataMergedLogRecordReader, Long> getLogRecordScanner(List<
         .withFileSystem(metadataMetaClient.getFs())
         .withBasePath(metadataBasePath)
         .withLogFilePaths(sortedLogFilePaths)
-        .withReaderSchema(schema)
+        .withReaderSchema(AvroInternalSchemaConverter.convertToEmpty(schema))

Review Comment:
   HoodieBackedTableMetadata no support scheam evolution.
   no need to modfiy those codes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1180918936

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 239534226d5cf6cbef8ef1e8dc454daf3dacf20b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851) 
   * faa859369ddf9c3724487eb5a028186d0a970154 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181326389

   Thanks, i will take a look this week, and before that, please do not merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179739041

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c71cb55bf081afc59b3c323e59e825a9e482e3c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816) 
   * 2dc34ce7581e1f0c631901ed9060837343220f2f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179708431

   Resolved conflict with master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179960794

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cb28b4f297e0e0b41ce9ea46b8be5002190e9f94 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826) 
   * 691626338b1af1235755ec876dbd35ffbf050ca1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033205796


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   the following example throws exception as well:
   ```java
   CastMap castMap = new CastMap();
   castMap.add(0, new IntType(), new BigIntType()); // cast int => long
   castMap.castIfNeeded(0, "wrong arg"); // <----- error, expected int but actual is string
   ```
   ```
   java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1309651755

   @danny0405  it's rebased


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1152634451

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1177038874

   Resolved conflict with master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1180412074

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 691626338b1af1235755ec876dbd35ffbf050ca1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831) 
   * 239534226d5cf6cbef8ef1e8dc454daf3dacf20b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1180407161

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 691626338b1af1235755ec876dbd35ffbf050ca1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831) 
   * 239534226d5cf6cbef8ef1e8dc454daf3dacf20b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r908244809


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class SchemaEvolutionContext implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  public static Option<SchemaEvolutionContext> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new SchemaEvolutionContext(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  public SchemaEvolutionContext(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    String commitTime = split.getBasePath()
+        .map(FSUtils::getCommitTime)
+        .orElse(split.getLatestCommit());
+    return getActualSchema(commitTime);
+  }
+
+  public List<String> getFieldNames(InternalSchema internalSchema) {
+    return internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList());
+  }
+
+  public List<DataType> getFieldTypes(InternalSchema internalSchema) {
+    return AvroSchemaConverter.convertToDataType(
+        AvroInternalSchemaConverter.convert(internalSchema, getTableName())).getChildren();
+  }
+
+  public CastMap getCastMap(InternalSchema querySchema, InternalSchema actualSchema) {
+    return CastMap.of(getTableName(), querySchema, actualSchema);
+  }
+
+  public static LogicalType[] project(List<DataType> fieldTypes, int[] selectedFields) {
+    return Arrays.stream(selectedFields)

Review Comment:
   it will be better to support nested column projects in the future 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1153568144

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220) 
   * 80d33308c8a4e290c0c7b66fff4023e8825f8163 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1021098630


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -91,6 +91,11 @@ public static InternalSchema convert(Schema schema) {
     return new InternalSchema(fields);
   }
 
+  /** Convert an avro schema into internalSchema with given versionId. */
+  public static InternalSchema convertToEmpty(Schema schema) {
+    return new InternalSchema(InternalSchema.EMPTY_SCHEMA_VERSION_ID, schema);

Review Comment:
   empty schema means schema evolution disabled or appropriate internal schema not found



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1256128696

   > @trushev Good job, I've tested it and it works on the whole, but a little defects and I'll point out
   
   Thank you for the feedback I'll fix these defects soon as well as the previous ones pointed out by danny0405
   Sorry for the stale PR. It is hard to maintain common part of SE feature


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288413251

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864355


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FormatUtils.java:
##########
@@ -130,6 +132,7 @@ public static HoodieMergedLogRecordScanner logScanner(
         .withBasePath(split.getTablePath())
         .withLogFilePaths(split.getLogPaths().get())
         .withReaderSchema(logSchema)
+        .withInternalSchema(internalSchema)
         .withLatestInstantTime(split.getLatestCommit())

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033135573


##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkTable.java:
##########
@@ -102,4 +108,9 @@ public <T extends SpecificRecordBase> Option<HoodieTableMetadataWriter> getMetad
       return Option.empty();
     }
   }
+
+  private static void setLatestInternalSchema(HoodieWriteConfig config, HoodieTableMetaClient metaClient) {
+    Option<InternalSchema> internalSchema = new TableSchemaResolver(metaClient).getTableInternalSchemaFromCommitMetadata();

Review Comment:
   replaced with `isPresent()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033124874


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -120,6 +121,12 @@ private FlinkOptions() {
       .withDescription("The default partition name in case the dynamic partition"
           + " column value is null/empty string");
 
+  public static final ConfigOption<Boolean> SCHEMA_EVOLUTION_ENABLED = ConfigOptions
+      .key(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key())
+      .booleanType()

Review Comment:
   If there is no this ConfigOption we need to get the value from flink conf using deprecated 
   `conf.getBoolean(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key(), false)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033174400


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   > do we throw exception here 
   
   no, we followed by spark's implementation `org.apache.hudi.client.utils.SparkInternalSchemaConverter#convertDoubleType`
   
   > what kind of data type is castable here
   
   * Float => Double, Decimal
   * Double => Decimal
   * Decimal => Decimal (change precision or scale)
   * String => Decimal
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022274270


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -91,6 +91,11 @@ public static InternalSchema convert(Schema schema) {
     return new InternalSchema(fields);
   }
 
+  /** Convert an avro schema into internalSchema with given versionId. */
+  public static InternalSchema convertToEmpty(Schema schema) {
+    return new InternalSchema(InternalSchema.EMPTY_SCHEMA_VERSION_ID, schema);

Review Comment:
   So a avro schema with `empty` version id represents which case ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022274976


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatReader.java:
##########
@@ -141,4 +139,8 @@ public HoodieLogBlock prev() throws IOException {
     return this.currentReader.prev();
   }
 
+  private Schema getReaderSchema() {
+    boolean useWriterSchema = !readerSchema.isEmptySchema();
+    return useWriterSchema ? null : readerSchema.getAvroSchema();

Review Comment:
   Then please also keep the old comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022281627


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatReader.java:
##########
@@ -141,4 +139,8 @@ public HoodieLogBlock prev() throws IOException {
     return this.currentReader.prev();
   }
 
+  private Schema getReaderSchema() {
+    boolean useWriterSchema = !readerSchema.isEmptySchema();
+    return useWriterSchema ? null : readerSchema.getAvroSchema();

Review Comment:
   You are right, the old comment should have been kept. I fixed that in `add nullable annotation and comment about writer schema`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1314705991

   > withInternalSchema(
   
   Agree


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1330040294

   Can you scrash and force push here. I didn't see the Azure CI history, let's re-trigger it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1330031183

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1329995935",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1331903292

   @trushev Thanks a lot for contributing this feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] flashJd commented on pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
flashJd commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1255995690

   @danny0405 @xiarixiaoyao  this pr is pending for two mouths, when can we merge it, as spark only support full schema evolution in spark 3.x.x, my spark version is 2.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1018637325


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -817,6 +805,8 @@ public abstract static class Builder {
 
     public abstract Builder withReaderSchema(Schema schema);
 
+    public abstract Builder withReaderSchema(InternalSchema internalSchema);
+

Review Comment:
   We should not keep 2 `withReaderSchema` here, keeps either one of them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1029057838


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java:
##########
@@ -394,4 +416,22 @@ private InflaterInputStreamFactory<?> getInflaterInputStreamFactory(org.apache.h
     }
   }
 
+  private void setActualFields(FileInputSplit fileSplit) {
+    FlinkInternalSchemaManager sm = schemaManager.get();
+    InternalSchema actualSchema = sm.getActualSchema(fileSplit);
+    List<DataType> fieldTypes = sm.getFieldTypes(actualSchema);
+    CastMap castMap = sm.getCastMap(sm.getQuerySchema(), actualSchema);
+    int[] shiftedSelectedFields = Arrays.stream(selectedFields).map(pos -> pos + HOODIE_META_COLUMNS.size()).toArray();
+    if (castMap.containsAnyPos(shiftedSelectedFields)) {

Review Comment:
   Nice catch, fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1171888697

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5164437958be477aa84e5acc151cda008a8c8607 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280) 
   * 88ce744bc98ae26b81b00276a5e289c435188889 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181248326

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ef13a2d832c21b69938c958e1e84e4667d0b402d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858) 
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181036007

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * faa859369ddf9c3724487eb5a028186d0a970154 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181306711

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1177066757

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 88ce744bc98ae26b81b00276a5e289c435188889 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663) 
   * 197780acd4560103dbe846d7bf09bd50efa80066 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1177165577

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 197780acd4560103dbe846d7bf09bd50efa80066 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179728919

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c71cb55bf081afc59b3c323e59e825a9e482e3c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179750476

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2dc34ce7581e1f0c631901ed9060837343220f2f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927284257


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -135,6 +139,12 @@
    */
   private boolean closed = true;
 
+  private final Option<SchemaEvolutionContext> schemaEvolutionContext;
+  private List<String> actualFieldNames;
+  private List<DataType> actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   There is no correct schema at moment of contraction `MergeOnReadInputFormat`:
   baseFile1 with schema1 {id: int, value: int}
   baseFile2 with schema2 {id: int, value: long} -- query schema
   both files will be passed to the same `MergeOnReadInputFormat#open` as `MergeOnReadInputSplit`
   we need to read baseFile1 with schema1 then cast `value: int` to `value: long` using `CastMap` 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927319245


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java:
##########
@@ -99,10 +113,36 @@ public CopyOnWriteInputFormat(
     this.selectedFields = selectedFields;
     this.conf = new SerializableConfiguration(conf);
     this.utcTimestamp = utcTimestamp;
+    this.schemaEvolutionContext = SchemaEvolutionContext.of(flinkConf);
   }
 
   @Override
   public void open(FileInputSplit fileSplit) throws IOException {
+    String[] actualFieldNames;
+    DataType[] actualFieldTypes;
+    if (schemaEvolutionContext.isPresent()) {
+      SchemaEvolutionContext context = schemaEvolutionContext.get();
+      InternalSchema actualSchema = context.getActualSchema(fileSplit);

Review Comment:
   The same https://github.com/apache/hudi/pull/5830#discussion_r927284257
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1154854470

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5164437958be477aa84e5acc151cda008a8c8607 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] flashJd commented on pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
flashJd commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1255766997

   @trushev Good job, I've tested it and it works on the whole, but a little defects and I'll point out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1318006019

   > 1. fetch the original schema when the file was committed, read the record as is
   
   is it possible to fetch the original schema when the file was committed if SE disabled?
   
   > I would suggest we only do 1 for all kinds of input formats, and wrap another format only for schema evolution by doing step2, WDYT ?
   
   Yes it sounds reasonable. Moreover, I've already tried it:) and faced with one problem. Mb you know how to solve it. When SE enabled we can't  do the first step "read the record as is" because `int[] selectedFields` is based on latest read schema (aka querySchema) not on file schema (aka actualSchema). So we need additional `step 0`
   
   0. Prepare `int[] selectedFields` according to actual schema (if SE enabled)
   1. fetch the original schema when the file was committed, read the record as is
   2. project the record with latest read schema if needed (if SE enabled)
   So "wrap another format" looks likes:
   ```java
   InputFormat inputFormat;
   if (SE.enabled) {
     inputFormat = new ProjectFormat(new ReadAsIsFormat(new PrepareFormat()))
   } else {
     inputFormat = new ReadAsIsFormat()
   }
   ```
   I mean, this is a solvable problem, but the solution is more complicated than this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1318489360

   @danny0405 
   It is hard to maintain this PR. Despite the fact that this feature is related only to flink, changes are needed in common part. For example, https://github.com/apache/hudi/pull/5830#discussion_r1023681719 and https://github.com/apache/hudi/pull/5830#discussion_r1022504653
   Because of this, merge conflicts often appear. Currently, `HoodieMergeHelper.java` both modified again. I've decided that **I will make common part changes in separate PRs**. I hope that such changes will be quickly approved and merged into the master branch. This will reduce the number of conflicts and make it easier to maintain this PR as well as reviewing the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033212254


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   So, cast map returns null only if origin val is null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1331866546

   @trushev 
   Thanks a lot for contributing this feature and waiting for the review patiently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033194944


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -120,6 +121,12 @@ private FlinkOptions() {
       .withDescription("The default partition name in case the dynamic partition"
           + " column value is null/empty string");
 
+  public static final ConfigOption<Boolean> SCHEMA_EVOLUTION_ENABLED = ConfigOptions
+      .key(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key())
+      .booleanType()

Review Comment:
   Added `OptionsResolver.isSchemaEvolutionEnabled`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1018635942


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatReader.java:
##########
@@ -141,4 +139,8 @@ public HoodieLogBlock prev() throws IOException {
     return this.currentReader.prev();
   }
 
+  private Schema getReaderSchema() {
+    boolean useWriterSchema = !readerSchema.isEmptySchema();
+    return useWriterSchema ? null : readerSchema.getAvroSchema();

Review Comment:
   This is so confusing, why we use writer schema if the reader schema is not empty ? And what does it means for an empty internal schema ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1309745375

   [3981.patch.zip](https://github.com/apache/hudi/files/9977268/3981.patch.zip)
   Thanks for the contribution, have reviewed some of the part, and left a local patch here and some comments ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022504653


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
       return null;
     }
   }
+
+  protected Iterator<GenericRecord> getRecordIterator(
+      HoodieTable<T, ?, ?, ?> table,
+      HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+      HoodieBaseFile baseFile,
+      HoodieFileReader<GenericRecord> reader,
+      Schema readSchema) throws IOException {
+    Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+    if (!querySchemaOpt.isPresent()) {
+      querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+    }
+    boolean needToReWriteRecord = false;
+    Map<String, String> renameCols = new HashMap<>();
+    // TODO support bootstrap
+    if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) {

Review Comment:
   @trushev  
   can we avoid moved this code snippet, i donnot think flink evolution need to modify those codes.
   https://github.com/apache/hudi/pull/6358   and https://github.com/apache/hudi/pull/7183 will optimize this code
   
   @danny0405  
   we need check evolution for each base file.
   Once we have made multiple columns changes, different base files may have different schemas, and we cannot use the schema of the current table to read these files directly, an exception will be thrown directly
   
   tableA: a int, b string, c double and there exist three files in this table: f1, f2, f3
   
   drop column from tableA and add new column d, and then we update tableA, but we only update f2,and f3, f1 is not touched
   now schema
   ```
   schema1  from tableA: a int, b string, d long.  
   schema2  from f2,f3:  a int, b string, d long 
   schema3 from f1 is: a int, b string , c double
   ```
   we should not use schema1 to read f1.
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022515240


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##########
@@ -130,4 +145,48 @@ protected Void getResult() {
       return null;
     }
   }
+
+  protected Iterator<GenericRecord> getRecordIterator(
+      HoodieTable<T, ?, ?, ?> table,
+      HoodieMergeHandle<T, ?, ?, ?> mergeHandle,
+      HoodieBaseFile baseFile,
+      HoodieFileReader<GenericRecord> reader,
+      Schema readSchema) throws IOException {
+    Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema());
+    if (!querySchemaOpt.isPresent()) {
+      querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata();
+    }
+    boolean needToReWriteRecord = false;
+    Map<String, String> renameCols = new HashMap<>();
+    // TODO support bootstrap
+    if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) {

Review Comment:
   > @trushev can we avoid moved this code snippet, i donnot think flink evolution need to modify those codes. #6358 and #7183 will optimize this code
   
   This code should be moved from `HoodieMergeHelper` to `BaseMergeHelper` due to current class hierarchy:
   <img width="439" src="https://user-images.githubusercontent.com/42293632/201876103-6e59834e-ad85-4b22-9de4-257e26cdfd88.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1021080563


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatReader.java:
##########
@@ -141,4 +139,8 @@ public HoodieLogBlock prev() throws IOException {
     return this.currentReader.prev();
   }
 
+  private Schema getReaderSchema() {
+    boolean useWriterSchema = !readerSchema.isEmptySchema();
+    return useWriterSchema ? null : readerSchema.getAvroSchema();

Review Comment:
   I used variable here on purpose to point out `useWriterSchema`. Essentially, it represents removed code
   https://github.com/apache/hudi/blob/release-0.12.1/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L162-L169 
   Added comment from origin code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
voonhous commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1354413054

   @trushev I've read through the PR and noticed that the scope of the changes included here is limited to supporting Hudi Full Schema Evolution (HFSE).
   
   Prior to HFSE, Hudi has been relying on Avro's native Schema-Resolution (ASR) to perform schema evolution when performing UPSERTs via Spark, where schema changes are applied implicitly.
   
   These implicit schema changes do not write to `.schema` and hence, the feature here will not support ASR reads via Flink. 
   
   I provided some examples (Mainly on Spark) in this issue here: #7444.
   
   I was wondering if you have any plans on supporting ASR reads via Flink. 
   
   If there are none, I plan on adding this support for ASR reads via Flink. Wanted to clarify to prevent repeated effort on the same feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927319541


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FormatUtils.java:
##########
@@ -130,6 +132,7 @@ public static HoodieMergedLogRecordScanner logScanner(
         .withBasePath(split.getTablePath())
         .withLogFilePaths(split.getLogPaths().get())
         .withReaderSchema(logSchema)
+        .withInternalSchema(internalSchema)
         .withLatestInstantTime(split.getLatestCommit())

Review Comment:
   The same https://github.com/apache/hudi/pull/5830#discussion_r927312625
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1171886471

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5164437958be477aa84e5acc151cda008a8c8607 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280) 
   * 88ce744bc98ae26b81b00276a5e289c435188889 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r927367996


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##########
@@ -135,10 +137,15 @@ public Builder withLogRecordScannerCallback(LogRecordScannerCallback callback) {
       return this;
     }
 
+    public Builder withInternalSchema(InternalSchema internalSchema) {
+      this.internalSchema = internalSchema;
+      return this;

Review Comment:
   Yes, a very confusing practice, the reader/format should be deterministic to one static given schema, it should not care about how the schema is generated, or where it comes from, say: it should not be imposed to the evolution logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1318015825

   > is it possible to fetch the original schema when the file was committed if SE disabled?
   
   There is no need to fetch the original schema if SE is disabled.
   
   > Prepare int[] selectedFields according to actual schema (if SE enabled)
   
   This should be quite easy to do.
   
   > but the solution is more complicated than this PR
   
   I have different thoughts and think this  makes the input formats code more clean (towards immutable schema or on-read schema) and maintainable.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022275500


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -91,6 +91,11 @@ public static InternalSchema convert(Schema schema) {
     return new InternalSchema(fields);
   }
 
+  /** Convert an avro schema into internalSchema with given versionId. */
+  public static InternalSchema convertToEmpty(Schema schema) {
+    return new InternalSchema(InternalSchema.EMPTY_SCHEMA_VERSION_ID, schema);

Review Comment:
   avro schema with empty version id represents disabled schema evolution as we don't want to affect the original avroSchema behavior
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022520427


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -91,6 +91,11 @@ public static InternalSchema convert(Schema schema) {
     return new InternalSchema(fields);
   }
 
+  /** Convert an avro schema into internalSchema with given versionId. */
+  public static InternalSchema convertToEmpty(Schema schema) {
+    return new InternalSchema(InternalSchema.EMPTY_SCHEMA_VERSION_ID, schema);

Review Comment:
   getEmptyInternalSchema  will give a emptyInternalSchema which indicate schema evolution disabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1022524997


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/InternalSchema.java:
##########
@@ -66,6 +77,11 @@ public InternalSchema(Field... columns) {
     this(DEFAULT_VERSION_ID, Arrays.asList(columns));
   }
 
+  public InternalSchema(long versionId, Schema avroSchema) {
+    this(versionId, ((Types.RecordType) AvroInternalSchemaConverter.convertToField(avroSchema)).fields());
+    this.avroSchema = avroSchema;
+  }

Review Comment:
   It was an idea how to solve this one https://github.com/apache/hudi/pull/5830#discussion_r925340228
   But I'm going to revert in now due to https://github.com/apache/hudi/pull/5830#issuecomment-1314709284



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1331591261

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1288409306",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7982061f9d492b4c4d51ca4589e5a30dbc76530a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860",
       "triggerID" : "1329995935",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5b7eed269294cb9be8f0875517b062e45e7ddb84",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5b7eed269294cb9be8f0875517b062e45e7ddb84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860) 
   * 5b7eed269294cb9be8f0875517b062e45e7ddb84 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1031125720


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FlinkInternalSchemaManager.java:
##########
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import org.apache.hadoop.fs.Path;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class FlinkInternalSchemaManager implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  /**
+   * Creates the manager if schema evolution enabled.
+   */
+  public static Option<FlinkInternalSchemaManager> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new FlinkInternalSchemaManager(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  FlinkInternalSchemaManager(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  /**
+   * Returns query schema as InternalSchema.
+   */
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  /**
+   * Returns schema of fileSplit.
+   */
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  /**
+   * Returns schema of mor fileSplit.
+   */
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    Option<String> basePath = split.getBasePath();
+    String commitTime;
+    if (basePath.isPresent()) {
+      String name = new Path(basePath.get()).getName();
+      commitTime = FSUtils.getCommitTime(name);
+    } else {
+      commitTime = split.getLatestCommit();

Review Comment:
   This code is removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1326207485

   > Would review it tomorrow, i see there is a conflict, can we resolve it first.
   
   https://github.com/apache/hudi/pull/6358 have just been merged. All my tests don't work. Schema evolution is broken with this patch. Need to debug. WIP again
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033192441


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   Thanks, i see we return null when CastMap cast a type that is not in its precedence list, is that reasonable ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033305513


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetSplitReader.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader;
+
+import org.apache.flink.table.data.RowData;
+
+import java.io.IOException;
+
+/**
+ * Hoodie wrapper for flink parquet reader.
+ */
+public final class HoodieParquetSplitReader implements HoodieParquetReader {
+  private final ParquetColumnarRowSplitReader reader;
+
+  public HoodieParquetSplitReader(ParquetColumnarRowSplitReader reader) {
+    this.reader = reader;
+  }

Review Comment:
   I avoided it on purpose because of
   1) `ParquetColumnarRowSplitReader` is copied from flink. I'd like to avoid any changes in this class.
   2) We should maintain 3 versions of this: 1.13.x, 1.14.x, 1.15.x.
   3) There is note in `ParquetSplitReaderUtil`:
   ```
    * <p>NOTE: reference from Flink release 1.11.2 {@code ParquetSplitReaderUtil}, modify to support INT64
    * based TIMESTAMP_MILLIS as ConvertedType, should remove when Flink supports that.
   ```
   I think if we remove `ParquetSplitReaderUtil` then we want to remove `ParquetColumnarRowSplitReader` as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033305513


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/HoodieParquetSplitReader.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader;
+
+import org.apache.flink.table.data.RowData;
+
+import java.io.IOException;
+
+/**
+ * Hoodie wrapper for flink parquet reader.
+ */
+public final class HoodieParquetSplitReader implements HoodieParquetReader {
+  private final ParquetColumnarRowSplitReader reader;
+
+  public HoodieParquetSplitReader(ParquetColumnarRowSplitReader reader) {
+    this.reader = reader;
+  }

Review Comment:
   I avoided it on purpose because of
   1) `ParquetColumnarRowSplitReader` copied from flink. I'd like to avoid any changes in this class.
   2) We should maintain 3 versions of this: 1.13.x, 1.14.x, 1.15.x.
   3) There is note in `ParquetSplitReaderUtil`:
   ```
    * <p>NOTE: reference from Flink release 1.11.2 {@code ParquetSplitReaderUtil}, modify to support INT64
    * based TIMESTAMP_MILLIS as ConvertedType, should remove when Flink supports that.
   ```
   I think if we remove `ParquetSplitReaderUtil` then we want to remove `ParquetColumnarRowSplitReader` as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033190497


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -120,6 +121,12 @@ private FlinkOptions() {
       .withDescription("The default partition name in case the dynamic partition"
           + " column value is null/empty string");
 
+  public static final ConfigOption<Boolean> SCHEMA_EVOLUTION_ENABLED = ConfigOptions
+      .key(HoodieCommonConfig.SCHEMA_EVOLUTION_ENABLE.key())
+      .booleanType()

Review Comment:
   No worries, just add a tool in `OptionsResolver`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1033205796


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestCastMap.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigDecimal;
+import java.time.LocalDate;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests for {@link CastMap}.
+ */
+public class TestCastMap {
+
+  @Test
+  public void testCastInt() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new IntType(), new BigIntType());
+    castMap.add(1, new IntType(), new FloatType());
+    castMap.add(2, new IntType(), new DoubleType());
+    castMap.add(3, new IntType(), new DecimalType());
+    castMap.add(4, new IntType(), new VarCharType());
+    int val = 1;
+    assertEquals(1L, castMap.castIfNeeded(0, val));
+    assertEquals(1.0F, castMap.castIfNeeded(1, val));
+    assertEquals(1.0, castMap.castIfNeeded(2, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(3, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(4, val));
+  }
+
+  @Test
+  public void testCastLong() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new BigIntType(), new FloatType());
+    castMap.add(1, new BigIntType(), new DoubleType());
+    castMap.add(2, new BigIntType(), new DecimalType());
+    castMap.add(3, new BigIntType(), new VarCharType());
+    long val = 1L;
+    assertEquals(1.0F, castMap.castIfNeeded(0, val));
+    assertEquals(1.0, castMap.castIfNeeded(1, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(2, val));
+    assertEquals(BinaryStringData.fromString("1"), castMap.castIfNeeded(3, val));
+  }
+
+  @Test
+  public void testCastFloat() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new FloatType(), new DoubleType());
+    castMap.add(1, new FloatType(), new DecimalType());
+    castMap.add(2, new FloatType(), new VarCharType());
+    float val = 1F;
+    assertEquals(1.0, castMap.castIfNeeded(0, val));
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(1, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(2, val));
+  }
+
+  @Test
+  public void testCastDouble() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DoubleType(), new DecimalType());
+    castMap.add(1, new DoubleType(), new VarCharType());
+    double val = 1;
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 1, 0), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));
+  }
+
+  @Test
+  public void testCastDecimal() {
+    CastMap castMap = new CastMap();
+    castMap.add(0, new DecimalType(2, 1), new DecimalType(3, 2));
+    castMap.add(1, new DecimalType(), new VarCharType());
+    DecimalData val = DecimalData.fromBigDecimal(BigDecimal.ONE, 2, 1);
+    assertEquals(DecimalData.fromBigDecimal(BigDecimal.ONE, 3, 2), castMap.castIfNeeded(0, val));
+    assertEquals(BinaryStringData.fromString("1.0"), castMap.castIfNeeded(1, val));

Review Comment:
   the following example throws exception as well:
   ```java
   CastMap castMap = new CastMap();
   castMap.add(0, new IntType(), new BigIntType());
   castMap.castIfNeeded(0, "wrong arg"); // <----- error, expected int but actual is string
   ```
   ```
   java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002875151


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class SchemaEvolutionContext implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  public static Option<SchemaEvolutionContext> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new SchemaEvolutionContext(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  public SchemaEvolutionContext(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    String commitTime = split.getBasePath()
+        .map(FSUtils::getCommitTime)
+        .orElse(split.getLatestCommit());
+    return getActualSchema(commitTime);
+  }
+
+  public List<String> getFieldNames(InternalSchema internalSchema) {
+    return internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList());
+  }
+
+  public List<DataType> getFieldTypes(InternalSchema internalSchema) {
+    return AvroSchemaConverter.convertToDataType(

Review Comment:
   Fixed



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Type;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.internal.schema.utils.InternalSchemaUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.binary.BinaryStringData;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.LogicalTypeRoot;
+import org.apache.flink.util.Preconditions;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.time.LocalDate;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.BIGINT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DATE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DECIMAL;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.DOUBLE;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.FLOAT;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.INTEGER;
+import static org.apache.flink.table.types.logical.LogicalTypeRoot.VARCHAR;
+
+/**
+ * CastMap is responsible for conversion of flink types when full schema evolution enabled.
+ */

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [WIP][HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1001519031


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection will be created that will convert the old type to the queried one.
+ */
+public final class SchemaEvolutionContext implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  public static Option<SchemaEvolutionContext> of(Configuration conf) {
+    if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      return new TableSchemaResolver(metaClient)
+          .getTableInternalSchemaFromCommitMetadata()
+          .map(schema -> new SchemaEvolutionContext(metaClient, schema));
+    } else {
+      return Option.empty();
+    }
+  }
+
+  public SchemaEvolutionContext(HoodieTableMetaClient metaClient, InternalSchema querySchema) {
+    this.metaClient = metaClient;
+    this.querySchema = querySchema;
+  }
+
+  public InternalSchema getQuerySchema() {
+    return querySchema;
+  }
+
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+    return getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+    String commitTime = split.getBasePath()
+        .map(FSUtils::getCommitTime)

Review Comment:
   Good catch, thanks



##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/ITTestSchemaEvolution.java:
##########
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.sink;
+
+import org.apache.hudi.client.HoodieFlinkWriteClient;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.keygen.ComplexAvroKeyGenerator;
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
+import org.apache.hudi.table.HoodieTableFactory;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.TableResult;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.factories.FactoryUtil;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+import org.apache.flink.util.Preconditions;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.internal.schema.action.TableChange.ColumnPositionChange.ColumnPositionType.AFTER;
+import static org.apache.hudi.utils.TestConfigurations.ROW_TYPE;
+import static org.apache.hudi.utils.TestConfigurations.ROW_TYPE_EVOLUTION;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class ITTestSchemaEvolution extends AbstractTestBase {
+  @TempDir File tempFile;
+  StreamExecutionEnvironment env;
+  StreamTableEnvironment tEnv;
+
+  String[] expectedMergedResult = new String[] {
+      "+I[Danny, 10000.1, 23]",
+      "+I[Stephen, null, 33]",
+      "+I[Julian, 30000.3, 53]",
+      "+I[Fabian, null, 31]",
+      "+I[Sophia, null, 18]",
+      "+I[Emma, null, 20]",
+      "+I[Bob, null, 44]",
+      "+I[Han, null, 56]",
+      "+I[Alice, 90000.9, unknown]"
+  };
+
+  String[] expectedUnMergedResult = new String[] {
+      "+I[Danny, null, 23]",
+      "+I[Stephen, null, 33]",
+      "+I[Julian, null, 53]",
+      "+I[Fabian, null, 31]",
+      "+I[Sophia, null, 18]",
+      "+I[Emma, null, 20]",
+      "+I[Bob, null, 44]",
+      "+I[Han, null, 56]",
+      "+I[Alice, 90000.9, unknown]",
+      "+I[Danny, 10000.1, 23]",
+      "+I[Julian, 30000.3, 53]"
+  };
+
+  @BeforeEach
+  public void setUp() {
+    env = StreamExecutionEnvironment.getExecutionEnvironment();
+    env.setParallelism(1);
+    tEnv = StreamTableEnvironment.create(env);
+  }
+
+  @Test
+  public void testCopyOnWriteInputFormat() throws Exception {
+    testRead(defaultOptionMap(tempFile.getAbsolutePath()));
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatBaseFileOnlyIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.READ_AS_STREAMING.key(), true);
+    optionMap.put(FlinkOptions.READ_START_COMMIT.key(), FlinkOptions.START_COMMIT_EARLIEST);
+    testRead(optionMap);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatBaseFileOnlyFilteringIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.READ_AS_STREAMING.key(), true);
+    optionMap.put(FlinkOptions.READ_START_COMMIT.key(), 1);
+    testRead(optionMap);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatLogFileOnlyIteratorGetLogFileIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    testRead(optionMap);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatLogFileOnlyIteratorGetUnMergedLogFileIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.READ_AS_STREAMING.key(), true);
+    optionMap.put(FlinkOptions.READ_START_COMMIT.key(), FlinkOptions.START_COMMIT_EARLIEST);
+    optionMap.put(FlinkOptions.CHANGELOG_ENABLED.key(), true);
+    testRead(optionMap, expectedUnMergedResult);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatMergeIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 1);
+    testRead(optionMap, true);
+  }
+
+  @Test
+  public void testMergeOnReadInputFormatSkipMergeIterator() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 1);
+    optionMap.put(FlinkOptions.MERGE_TYPE.key(), FlinkOptions.REALTIME_SKIP_MERGE);
+    testRead(optionMap, true, expectedUnMergedResult);
+  }
+
+  @SuppressWarnings({"SqlDialectInspection", "SqlNoDataSourceInspection"})
+  @Test
+  public void testCompaction() throws Exception {
+    OptionMap optionMap = defaultOptionMap(tempFile.getAbsolutePath());
+    optionMap.put(FlinkOptions.TABLE_TYPE.key(), FlinkOptions.TABLE_TYPE_MERGE_ON_READ);
+    optionMap.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 1);
+    testRead(optionMap, new String[0]);

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179738256

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c71cb55bf081afc59b3c323e59e825a9e482e3c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816) 
   * 2dc34ce7581e1f0c631901ed9060837343220f2f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179958853

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cb28b4f297e0e0b41ce9ea46b8be5002190e9f94 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826) 
   * 691626338b1af1235755ec876dbd35ffbf050ca1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1179716753

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 197780acd4560103dbe846d7bf09bd50efa80066 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754) 
   * c71cb55bf081afc59b3c323e59e825a9e482e3c4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1180030100

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 691626338b1af1235755ec876dbd35ffbf050ca1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r908231628


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java:
##########
@@ -227,7 +228,10 @@ protected void loadRecords(String partitionPath) throws Exception {
             .filter(logFile -> isValidFile(logFile.getFileStatus()))
             .map(logFile -> logFile.getPath().toString())
             .collect(toList());
-        HoodieMergedLogRecordScanner scanner = FormatUtils.logScanner(logPaths, schema, latestCommitTime.get().getTimestamp(),
+        InternalSchema internalSchema = new TableSchemaResolver(this.hoodieTable.getMetaClient())

Review Comment:
   we can reuse the TableSchemaResolver which create in 200 line. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1181243368

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9280",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     }, {
       "hash" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9663",
       "triggerID" : "88ce744bc98ae26b81b00276a5e289c435188889",
       "triggerType" : "PUSH"
     }, {
       "hash" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9754",
       "triggerID" : "197780acd4560103dbe846d7bf09bd50efa80066",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9816",
       "triggerID" : "c71cb55bf081afc59b3c323e59e825a9e482e3c4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9818",
       "triggerID" : "2dc34ce7581e1f0c631901ed9060837343220f2f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9826",
       "triggerID" : "cb28b4f297e0e0b41ce9ea46b8be5002190e9f94",
       "triggerType" : "PUSH"
     }, {
       "hash" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9831",
       "triggerID" : "691626338b1af1235755ec876dbd35ffbf050ca1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9851",
       "triggerID" : "239534226d5cf6cbef8ef1e8dc454daf3dacf20b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9856",
       "triggerID" : "faa859369ddf9c3724487eb5a028186d0a970154",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858",
       "triggerID" : "ef13a2d832c21b69938c958e1e84e4667d0b402d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ef13a2d832c21b69938c958e1e84e4667d0b402d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9858) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1308308066

   @trushev Can you rebase the code again and i'm planning to review this code again ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1021115903


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -817,6 +805,8 @@ public abstract static class Builder {
 
     public abstract Builder withReaderSchema(Schema schema);
 
+    public abstract Builder withReaderSchema(InternalSchema internalSchema);
+

Review Comment:
   removed `withReaderSchema(Schema schema)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1309655985

   @danny0405 rebased


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1347854543

   Hope this pr can be merged into 0.12.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1153600248

   @xiarixiaoyao could you pls review this PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1156188215

   @trushev  thanks for your contribution, i will review it next few days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution(RFC-33)

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1154717501

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9211",
       "triggerID" : "152d9abfe646e966dd40171a15fd5faa5e0a4594",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9218",
       "triggerID" : "06a66b2cc3450cd29b13b755976480317e134b4c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9220",
       "triggerID" : "7d2b94325ccd8eb7f424f5c509d753b3e2d2c6f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252",
       "triggerID" : "80d33308c8a4e290c0c7b66fff4023e8825f8163",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5164437958be477aa84e5acc151cda008a8c8607",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5164437958be477aa84e5acc151cda008a8c8607",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 80d33308c8a4e290c0c7b66fff4023e8825f8163 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9252) 
   * 5164437958be477aa84e5acc151cda008a8c8607 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org