You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/16 13:34:45 UTC

[GitHub] [hudi] kazdy commented on issue #5873: [SUPPORT] Reconcile schema - missing field dropped from metadata

kazdy commented on issue #5873:
URL: https://github.com/apache/hudi/issues/5873#issuecomment-1157669265

   @xiarixiaoyao I was hoping that with schema reconciliation "default values will be injected to missing fields" as per the docs:
   
   > When a new batch of write has records with old schema, but latest table schema got evolved, this config will upgrade the records to leverage latest table schema(default values will be injected to missing fields). If not, the write batch would fail.
   
   The scenario I described does not happen when I have a missing column but no new column in the same batch. Then Hudi injects null to the missing column and the column is not removed from the table in metastore.
   
   The behavior I'm looking for is like this:
   incoming data doesn’t contain every column in the table –> those columns will simply be assigned null/default values
   This is what other similar frameworks allow users to do, so I guess Hudi can do the same possibly as an option guarded by a config if someone prefers to enforce schema more strictly.
   
   I also found a comment from another Hudi issue, that makes me think that my scenario should work:
   @TarunMootala can you upgrade Hudi to 0.10.1. this can reconcile the schema wherever the new field is put in. Spark-SQL is still having some problems that the new middle field can't be shown.
   But I test in mater branch, all of the problems above have gone.
   
   _Originally posted by @YannByron in https://github.com/apache/hudi/issues/4914#issuecomment-1063623677_
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org