You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/02/17 23:30:00 UTC

[jira] [Closed] (HUDI-3981) Flink engine support for comprehensive schema evolution

     [ https://issues.apache.org/jira/browse/HUDI-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo closed HUDI-3981.
---------------------------
    Resolution: Fixed

> Flink engine support for comprehensive schema evolution
> -------------------------------------------------------
>
>                 Key: HUDI-3981
>                 URL: https://issues.apache.org/jira/browse/HUDI-3981
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: flink
>            Reporter: Alexander Trushev
>            Assignee: Alexander Trushev
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> h3. Context
> Currently, there is no support of schema evolution presented in RFC-33 in flink engine.
> Example 1. Assume spark changes type of column:
> {code:sql}
> set hoodie.schema.on.read.enable=true
> create table t1 (id int, val int, par string) ... partitioned by (par)
> insert into t1 values (1, 10, 'p1')
> alter table t1 alter column val type string
> insert into t1 values (2, 'val20', 'p2')
> {code}
> When flink tries to read t1:
> {code:sql}
> create table t1 (id int, val string, par string) partitioned by (par) with (...)
> select * from t1
> {code}
> the error occurs:
> {noformat}
> java.lang.IllegalArgumentException: Unexpected type: INT32
> {noformat}
> This is just an example, errors may differ in the case of cow/mor/snapshot/incremental/batch/streaming/rename column/add column.
> Also it is not yet possible to write data when schema is changed.
> Example 2.  Case below leads to errors
> {noformat}
> flink: write data
> flink: stop job
> spark: modify schema according to RFC-33
> flink: new job with modified schema
> flink: write data
> {noformat}
> h3. Proposal
> Provide full support in flink engine when schema is modified according to RFC-33
> add column, rename column, change type of column, drop column when:
> # batch/streaming
> # mor (snapshot/incremental/optimized) read/write
> # cow (snapshot/incremental) read/write
> # mor compaction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)