You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/02/17 23:30:00 UTC
[jira] [Closed] (HUDI-3981) Flink engine support for comprehensive schema evolution
[ https://issues.apache.org/jira/browse/HUDI-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo closed HUDI-3981.
---------------------------
Resolution: Fixed
> Flink engine support for comprehensive schema evolution
> -------------------------------------------------------
>
> Key: HUDI-3981
> URL: https://issues.apache.org/jira/browse/HUDI-3981
> Project: Apache Hudi
> Issue Type: New Feature
> Components: flink
> Reporter: Alexander Trushev
> Assignee: Alexander Trushev
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.13.0
>
>
> h3. Context
> Currently, there is no support of schema evolution presented in RFC-33 in flink engine.
> Example 1. Assume spark changes type of column:
> {code:sql}
> set hoodie.schema.on.read.enable=true
> create table t1 (id int, val int, par string) ... partitioned by (par)
> insert into t1 values (1, 10, 'p1')
> alter table t1 alter column val type string
> insert into t1 values (2, 'val20', 'p2')
> {code}
> When flink tries to read t1:
> {code:sql}
> create table t1 (id int, val string, par string) partitioned by (par) with (...)
> select * from t1
> {code}
> the error occurs:
> {noformat}
> java.lang.IllegalArgumentException: Unexpected type: INT32
> {noformat}
> This is just an example, errors may differ in the case of cow/mor/snapshot/incremental/batch/streaming/rename column/add column.
> Also it is not yet possible to write data when schema is changed.
> Example 2. Case below leads to errors
> {noformat}
> flink: write data
> flink: stop job
> spark: modify schema according to RFC-33
> flink: new job with modified schema
> flink: write data
> {noformat}
> h3. Proposal
> Provide full support in flink engine when schema is modified according to RFC-33
> add column, rename column, change type of column, drop column when:
> # batch/streaming
> # mor (snapshot/incremental/optimized) read/write
> # cow (snapshot/incremental) read/write
> # mor compaction
--
This message was sent by Atlassian Jira
(v8.20.10#820010)