You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Daniel Kaźmirski (Jira)" <ji...@apache.org> on 2022/06/17 10:49:00 UTC

[jira] [Updated] (HUDI-4276) Reconcile schema - inject null values for missing fields and add new fields

     [ https://issues.apache.org/jira/browse/HUDI-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Kaźmirski updated HUDI-4276:
-----------------------------------
    Description: 
Improve schema reconciliation to make it more flexible in presence of full schema evolution being enabled.

Desired behavior:
 # incoming data has missing columns that were already defined in the table –> null values will be injected into missing columns 
 # incoming data contains new columns not defined yet in the table -> columns will be added to the table schema (incoming dataframe?)
 # incoming data has missing columns that are already defined in the table and new columns not yet defined in the table -> new columns will be added to the table schema, missing columns will be injected with null values

No column should be dropped when using hive sync utility when schema reconciliation is enabled.

Related GH issue:
[https://github.com/apache/hudi/issues/5873]

 

  was:
Improve schema reconciliation to make it more flexible in presence of full schema evolution enabled.



Desired behavior:
 # incoming data has missing columns that were already defined in the table –> null values will be injected into missing columns 
 # incoming data contains new columns not defined yet in the table -> columns will be added to the table schema (incoming dataframe?)
 # incoming data has missing columns in the table and new columns in the table -> new columns will be added to the table schema, missing columns will be injected with null values

No column should be dropped when using hive sync utility.

Related GH issue:
[https://github.com/apache/hudi/issues/5873]

 


> Reconcile schema - inject null values for missing fields and add new fields
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-4276
>                 URL: https://issues.apache.org/jira/browse/HUDI-4276
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Daniel Kaźmirski
>            Priority: Minor
>
> Improve schema reconciliation to make it more flexible in presence of full schema evolution being enabled.
> Desired behavior:
>  # incoming data has missing columns that were already defined in the table –> null values will be injected into missing columns 
>  # incoming data contains new columns not defined yet in the table -> columns will be added to the table schema (incoming dataframe?)
>  # incoming data has missing columns that are already defined in the table and new columns not yet defined in the table -> new columns will be added to the table schema, missing columns will be injected with null values
> No column should be dropped when using hive sync utility when schema reconciliation is enabled.
> Related GH issue:
> [https://github.com/apache/hudi/issues/5873]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)