You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Sun Shun (Jira)" <ji...@apache.org> on 2022/10/06 09:40:00 UTC

[jira] [Created] (FLINK-29527) Make unknownFieldsIndices work for single ParquetReader

Sun Shun created FLINK-29527:
--------------------------------

             Summary: Make unknownFieldsIndices work for single ParquetReader
                 Key: FLINK-29527
                 URL: https://issues.apache.org/jira/browse/FLINK-29527
             Project: Flink
          Issue Type: Bug
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
    Affects Versions: 1.16.0
            Reporter: Sun Shun


Currently, from the improvement [[FLINK-23715], Flink use a collection named `unknownFieldsIndices` to track the nonexistent fields, and it is kept inside the `ParquetVectorizedInputFormat`, and applied to all parquet files under given path.

However, some fields may only be nonexistent in some of the historical parquet files, while exist in latest ones. And based on `unknownFieldsIndices`, flink will always skip these fields, even thought they are existing in the later parquets.

As a result, the value of these fields will become empty when they are nonexistent in some historical parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)