You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/17 23:13:50 UTC

[GitHub] [hudi] kazdy opened a new issue, #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

kazdy opened a new issue, #5899:
URL: https://github.com/apache/hudi/issues/5899

   **Describe the problem you faced**
   
   I've tried using MERGE INTO with UPDATE * and INSERT * statement with full schema evolution enabled. 
   I've noticed that during insert new columns from incoming (that do not exist in target table yet) are dropped and target schema is applied. 
   
   Therefore can we as users automatically evolve schema on MERGE INTO operations? 
   I guess this should only be supported when we use update set * and insert * in merge operation.
   
   **Expected behavior**
   
   When incoming data is missing columns that already declared in target table these should be injected with default/null values.
   When incoming data has new columns that are not yet declared in the target table, these should be added to the target table.
   Case when incoming data has both missing columns and new columns, missing coluns should be injected with null/ default values, new columns should be added to the target table.
   
   New columns should be reflected in metastore table schema.
   
   Would be great to support complex types, and nested schemas.
   
   Thread from dev mailing list as a reference:
   https://lists.apache.org/thread/kr59hh7yqr2c1y33kzfv3n97h6ydbz9b
   
   **Environment Description**
   
   * Hudi version : 0.11 
   
   * Spark version : 3.2.0-amzn
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
kazdy commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1250052770

   @nsivabalan @xiarixiaoyao I created this jira - [HUDI-4872](https://issues.apache.org/jira/browse/HUDI-4872), so I'm closing the ticket. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
kazdy commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1160564716

   @codope it's related because it also calls for automatic schema evolution with similar logic.
   With MERGE INTO new fields are being dropped, so once [HUDI-4276](https://issues.apache.org/jira/browse/HUDI-4276) is done it can be used in MERGE INTO I guess?
   
   Looking at `MergeIntoHoodieTableCommand` I think the target table schema is always applied on write, so whatever new column was added, it's just being dropped from source DF.
   Because of this line:
   `HoodieWriteConfig.WRITE_SCHEMA.key -> getTableSchema.toString`
   where getTableSchema returns target table schema.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1163647932

   @xiarixiaoyao : Can you look into this issue. Looks like its related to schema evolution. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1216252616

   yes, thats my understanding too. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
kazdy commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1166266254

   hi @xiarixiaoyao thanks for response :)
   So full schema evolution is a separate feature to automatic evolution, but automatic evolution itself can be supported as a separate feature? Do I understand you correctly?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1166239652

   @kazdy   no full scheam evolution cannot support automatic evolution,  i am working for the feature , of course complex types, and nested schemas will supported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1160538495

   @kazdy Isn't it same as HUDI-4276?
   cc @xiarixiaoyao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5899:
URL: https://github.com/apache/hudi/issues/5899#issuecomment-1216253081

   @xiarixiaoyao : if we have a tracking jira, we can close the github issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy closed issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Posted by GitBox <gi...@apache.org>.
kazdy closed issue #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature
URL: https://github.com/apache/hudi/issues/5899


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org