You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/08/30 01:05:45 UTC

[GitHub] [hudi] parisni opened a new pull request, #6537: Avoid update metastore schema if only missing column in input

parisni opened a new pull request, #6537:
URL: https://github.com/apache/hudi/pull/6537

   ### Change Logs
   
   Currently when move a hudi table from schema1 to schema2 and then insert data with the old schema1, then schema 2 is kept for the whole table.
   
   This is not consistent with hive metastore which get its schema updated to the old schema1. 
   
   Then this PR avoid update the hive schema when only missing columns are in the input data.
   
   This might only work as proposed here when reconcile = true
   see https://hudi.apache.org/docs/configurations#hoodiedatasourcewritereconcileschema
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on pull request #6537: [HUDI-4762] Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
parisni commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1233904855

   added jira


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on pull request #6537: [HUDI-4762] Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
parisni commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1249828151

   Agreed, I will close this
   
   On Thu, 2022-09-15 at 22:47 -0700, Shiyan Xu wrote:
   > @xushiyan commented on this pull request.
   > 
   > 
   > 
   > > @@ -286,7 +286,11 @@ private boolean syncSchema(String tableName,
   > > boolean tableExists, boolean useRea
   >            config.getBooleanOrDefault(HIVE_SUPPORT_TIMESTAMP_TYPE));
   >        if (!schemaDiff.isEmpty()) {
   >          LOG.info("Schema difference found for " + tableName);
   > -        syncClient.updateTableSchema(tableName, schema);
   > +        if (!schemaDiff.getAddColumnTypes().isEmpty() ||
   > !schemaDiff.getUpdateColumnTypes().isEmpty()) {
   > 
   > we should always keep schema up to date. when later data written with
   > old schema,
   > https://hudi.apache.org/docs/configurations#hoodiedatasourcewritereconcileschema
   >  this config is to adapt the data into the new schema. so i don't
   > think we should skip update schema
   > 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231032198

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * a245595d0c988610d845f6918fe8c5ea76383e92 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231174900

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029",
       "triggerID" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "triggerType" : "PUSH"
     }, {
       "hash" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030",
       "triggerID" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231034589

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029",
       "triggerID" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "triggerType" : "PUSH"
     }, {
       "hash" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * a245595d0c988610d845f6918fe8c5ea76383e92 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029) 
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6537: [HUDI-4762] Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6537:
URL: https://github.com/apache/hudi/pull/6537#discussion_r972634507


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##########
@@ -286,7 +286,11 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea
           config.getBooleanOrDefault(HIVE_SUPPORT_TIMESTAMP_TYPE));
       if (!schemaDiff.isEmpty()) {
         LOG.info("Schema difference found for " + tableName);
-        syncClient.updateTableSchema(tableName, schema);
+        if (!schemaDiff.getAddColumnTypes().isEmpty() || !schemaDiff.getUpdateColumnTypes().isEmpty()) {

Review Comment:
   we should always keep schema up to date. when later data written with old schema, https://hudi.apache.org/docs/configurations#hoodiedatasourcewritereconcileschema this config is to adapt the data into the new schema. so i don't think we should skip update schema



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231066402

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029",
       "triggerID" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "triggerType" : "PUSH"
     }, {
       "hash" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030",
       "triggerID" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * a245595d0c988610d845f6918fe8c5ea76383e92 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029) 
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni closed pull request #6537: [HUDI-4762] Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
parisni closed pull request #6537: [HUDI-4762] Avoid update metastore schema if only missing column in input
URL: https://github.com/apache/hudi/pull/6537


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231104651

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029",
       "triggerID" : "a245595d0c988610d845f6918fe8c5ea76383e92",
       "triggerType" : "PUSH"
     }, {
       "hash" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030",
       "triggerID" : "00b9224ec8c49e83ca51d52351c782083a4fba84",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * a245595d0c988610d845f6918fe8c5ea76383e92 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029) 
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231029568

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e63b76454a06d57a141ad4b844752abb346d3fa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org