You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by sunshun18 <su...@126.com> on 2022/12/05 03:54:38 UTC
Patch to support Parquet schema evolution
Hi there,
I find an null-value issue when using Flink to read parquet files with multi versions of schema (V1->V2->V3->..->Vn).
Assuming there are two fileds in given parquet schema as below, and filed F2 only exist in version 2.
Version1: F1
Version2: F1, F2
Currently the value of filed F2 will be empty when reading data from parquet file using schema version2.
I explore the implementation, and find Flink use a collection named `unknownFieldsIndices` to track the nonexistent fields, applied to all parquet files under given path.
I draft a patch to fix this issue with unit test.
https://issues.apache.org/jira/browse/FLINK-29527
https://github.com/apache/flink/pull/21149
As these PR is pending for a long time, I hope any commitor can help review it and provide any feedback if possible.
Thanks!
Shun
Re: Patch to support Parquet schema evolution
Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Hi, Shun.
Thanks for the contribution. I'll have a look first and then find some committers help review & merge.
Best regards,
Yuxia
----- 原始邮件 -----
发件人: "sunshun18" <su...@126.com>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期一, 2022年 12 月 05日 上午 11:54:38
主题: Patch to support Parquet schema evolution
Hi there,
I find an null-value issue when using Flink to read parquet files with multi versions of schema (V1->V2->V3->..->Vn).
Assuming there are two fileds in given parquet schema as below, and filed F2 only exist in version 2.
Version1: F1
Version2: F1, F2
Currently the value of filed F2 will be empty when reading data from parquet file using schema version2.
I explore the implementation, and find Flink use a collection named `unknownFieldsIndices` to track the nonexistent fields, applied to all parquet files under given path.
I draft a patch to fix this issue with unit test.
https://issues.apache.org/jira/browse/FLINK-29527
https://github.com/apache/flink/pull/21149
As these PR is pending for a long time, I hope any commitor can help review it and provide any feedback if possible.
Thanks!
Shun