You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Yixue (Andrew) Zhu (Jira)" <ji...@apache.org> on 2020/05/23 06:03:00 UTC

[jira] [Commented] (HUDI-741) Fix Hoodie's schema evolution checks

    [ https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114555#comment-17114555 ] 

Yixue (Andrew) Zhu commented on HUDI-741:
-----------------------------------------

I am not sure the rationale for disallowing fields dropped for schema evolution is explained clearly in this Jira.

Why is the case the Reader Schema with less fields would cause data corruption/loss, if the change is intentional, i.e. users do not care about the dropped old fields anymore?

> Fix Hoodie's schema evolution checks
> ------------------------------------
>
>                 Key: HUDI-741
>                 URL: https://issues.apache.org/jira/browse/HUDI-741
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 120h
>          Time Spent: 20m
>  Remaining Estimate: 119h 40m
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by the HoodieWriteClient to create the records. The schema is also saved in the data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI dataset, schema can be evolved over time. But HUDI should ensure that the evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)