You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Prashant Wason (Jira)" <ji...@apache.org> on 2020/04/10 22:46:00 UTC

[jira] [Commented] (HUDI-741) Fix Hoodie's schema evolution checks

    [ https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081014#comment-17081014 ] 

Prashant Wason commented on HUDI-741:
-------------------------------------

Update: [~varadarb] informed me that schema is also available in the Hoodie commit as extraMetadata. This simplifies getting the last used schema for the checks.

> Fix Hoodie's schema evolution checks
> ------------------------------------
>
>                 Key: HUDI-741
>                 URL: https://issues.apache.org/jira/browse/HUDI-741
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 120h
>          Time Spent: 10m
>  Remaining Estimate: 119h 50m
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by the HoodieWriteClient to create the records. The schema is also saved in the data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI dataset, schema can be evolved over time. But HUDI should ensure that the evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)