You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/07 06:05:06 UTC

[GitHub] [hudi] dongkelun opened a new pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

dongkelun opened a new pull request #4533:
URL: https://github.com/apache/hudi/pull/4533


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007461496


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4983",
       "triggerID" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   * fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4983) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007388310


   @xiarixiaoyao : Can you please review this patch. thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007410108


   > @dongkelun : can you please check if [HUDI-3192](https://issues.apache.org/jira/browse/HUDI-3192) and https://issues.apache.org/jira/browse/HUDI-2682 are duplicates. if yes, please mark one of them as duplicate and close it.
   
   I think it's a duplicate.[HUDI-3192](https://issues.apache.org/jira/browse/HUDI-3192)  has been turned off


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

dongkelun commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007187752






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

dongkelun commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007187752


   @xushiyan @nsivabalan Hello,can you please take a review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot removed a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007164625


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780280392



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       > how about change this code to： if (cfg.tableProperties != null || cfg.syncAsSparkDataSourceTable) { hoodieHiveClient.updateTableProperties(tableName, tableProperties); LOG.info("Sync table properties for " + tableName + ", table properties is: " + (cfg.tableProperties == null ? "" : cfg.tableProperties)); }
   
   Sorry to see this new news now. Let me think about it first




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007454674


   > hi @xiarixiaoyao . thx for looking at this. not sure we can solve this from hudi. the problem happens on spark vanilla to. see my explainations here https://lists.apache.org/thread/9mmrnc5o7w42z723s2yqgcrdpwwtts3x
   
   Hello, I think this [PR](https://github.com/apache/hudi/pull/2283) can explain why it is necessary


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

hudi-bot removed a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007461496


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4983",
       "triggerID" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   * fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4983) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780286195



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       OK, I see. Thank you for your reminder. Your idea is better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007388310






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007186261






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao merged pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao merged pull request #4533:
URL: https://github.com/apache/hudi/pull/4533


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780276868



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       if add new columns,and cfg.tableProperties is null,then do not executeupdateTableProperties,then spark sql will not get the new columns.
   I'm not sure if delete columns and update columns are the same.
   If not, I think it can be judged by `schemaDiff.getAddColumnTypes().isEmpty()`.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

hudi-bot removed a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007186261


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780276868



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       if add new columns,and cfg.tableProperties is null,then do not executeupdateTableProperties,then spark sql will not get the new columns.
   I'm not sure if delete columns and update columns are the same.
   If not, I think it can be judged by `schemaDiff.getAddColumnTypes().isEmpty()`.
   

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       > how about change this code to： if (cfg.tableProperties != null || cfg.syncAsSparkDataSourceTable) { hoodieHiveClient.updateTableProperties(tableName, tableProperties); LOG.info("Sync table properties for " + tableName + ", table properties is: " + (cfg.tableProperties == null ? "" : cfg.tableProperties)); }
   
   Sorry to see this new news now. Let me think about it first

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       OK, I see. Thank you for your reminder. Your idea is better

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       How about changing the log like this?
   ```java
   LOG.info("Sync table properties for " + tableName + ", table properties is: " + tableProperties);
   ```

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       I have submitted the newly modified code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780292306



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       How about changing the log like this?
   ```java
   LOG.info("Sync table properties for " + tableName + ", table properties is: " + tableProperties);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao merged pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao merged pull request #4533:
URL: https://github.com/apache/hudi/pull/4533


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780270395



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       cfg.tableProperties ？，  i think it should be tableProperties

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       how about change this code to：
           if (cfg.tableProperties != null || cfg.syncAsSparkDataSourceTable) {
             hoodieHiveClient.updateTableProperties(tableName, tableProperties);
             LOG.info("Sync table properties for " + tableName + ", table properties is: "
                     + (cfg.tableProperties == null ? "" : cfg.tableProperties));
           }

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       no need to use chemaDiff.getAddColumnTypes().isEmpty().  your modify is ok, just 
   pay attention to that: cfg.tableProperties maybe null and only if sync DataSourceTable we need these logical




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot removed a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007166026






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] parisni commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

parisni commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007442525






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun edited a comment on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun edited a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007410108


   > @dongkelun : can you please check if [HUDI-3192](https://issues.apache.org/jira/browse/HUDI-3192) and https://issues.apache.org/jira/browse/HUDI-2682 are duplicates. if yes, please mark one of them as duplicate and close it.
   
   I think it's a duplicate.[HUDI-3192](https://issues.apache.org/jira/browse/HUDI-3192)  has been closed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun edited a comment on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun edited a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007410108


   > @dongkelun : can you please check if [HUDI-3192](https://issues.apache.org/jira/browse/HUDI-3192) and https://issues.apache.org/jira/browse/HUDI-2682 are duplicates. if yes, please mark one of them as duplicate and close it.
   
   I think it's a duplicate.[HUDI-3192](https://issues.apache.org/jira/browse/HUDI-3192)  has been closed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007164625


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

dongkelun commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007164107


   This PR is to solve this [issue](https://github.com/apache/hudi/issues/4525)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot removed a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007166026


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] parisni commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

parisni commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007504642


   > adding columns with Hive SQL is not supported
   
   > we have no way to control the behavie of hive
   
   does this mean the hive_sync shall be equal to `jdbc/hms` and distinct from `hiveql` when syncing the metastore ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007472831






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780310418



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       I have submitted the newly modified code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

dongkelun commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007464164


   > hi @xiarixiaoyao . thx for looking at this. not sure we can solve this from hudi. the problem happens on spark vanilla to. see my explainations here https://lists.apache.org/thread/9mmrnc5o7w42z723s2yqgcrdpwwtts3x
   
   I packed and verified it today. It should solve this problem However, adding columns with Hive SQL is not supported


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780270395



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       cfg.tableProperties ？，  i think it should be tableProperties




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

hudi-bot removed a comment on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007458724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   * fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007392092


   @dongkelun : can you please check if HUDI-3192 and https://issues.apache.org/jira/browse/HUDI-2682 are duplicates. if yes, please mark one of them as duplicate and close it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780282071



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       no need to use chemaDiff.getAddColumnTypes().isEmpty().  your modify is ok, just 
   pay attention to that: cfg.tableProperties maybe null and only if sync DataSourceTable we need these logical




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007186261


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on a change in pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#discussion_r780276528



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -251,10 +251,8 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
         LOG.info("Schema difference found for " + tableName);
         hoodieHiveClient.updateTableDefinition(tableName, schema);
         // Sync the table properties if the schema has changed
-        if (cfg.tableProperties != null) {
-          hoodieHiveClient.updateTableProperties(tableName, tableProperties);
-          LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);
-        }
+        hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+        LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties);

Review comment:
       how about change this code to：
           if (cfg.tableProperties != null || cfg.syncAsSparkDataSourceTable) {
             hoodieHiveClient.updateTableProperties(tableName, tableProperties);
             LOG.info("Sync table properties for " + tableName + ", table properties is: "
                     + (cfg.tableProperties == null ? "" : cfg.tableProperties));
           }




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007472831


   @parisni    we want sparksql tread hudi as DataSource table to have a better performace. 
   when spark read dataSource table, spark will restore table metadata from table properties(include table schema )
   you can see the original code in spark HiveExternalCatalog.restoreTableMetadata 
      * It reads table schema, provider, partition column names and bucket specification from table
      * properties, and filter out these special entries from table properties.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007475762


   @dongkelun  we have no way to control the behavie of hive, so i think this pr is ok.  thanks for your contribution.
   LGTM， wait for CI  pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007583568


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4983",
       "triggerID" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4983) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-3192] Spark metastore schema evolution broken

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007166026


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

hudi-bot commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007458724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965",
       "triggerID" : "63cee6177c62cf267849d4f9379eaad88fd5f584",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63cee6177c62cf267849d4f9379eaad88fd5f584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4965) 
   * fc2ebe0327f1125d4b6fd3a1e65f969f6754aae7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] parisni commented on pull request #4533: [HUDI-2682] Spark schema not updated with new columns on hive sync

Posted by GitBox <gi...@apache.org>.

parisni commented on pull request #4533:
URL: https://github.com/apache/hudi/pull/4533#issuecomment-1007442525


   hi @xiarixiaoyao . thx for looking at this. not sure we can solve this from hudi. the problem happens on spark vanilla to. see my explainations here https://lists.apache.org/thread/9mmrnc5o7w42z723s2yqgcrdpwwtts3x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org