You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2021/10/14 08:07:00 UTC

[jira] [Updated] (ARROW-14303) [C++][Parquet] Do not duplicate Schema metadata in Parquet schema metadata and serialized ARROW:schema value

     [ https://issues.apache.org/jira/browse/ARROW-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Molina updated ARROW-14303:
--------------------------------------
    Fix Version/s:     (was: 6.0.0)
                   7.0.0

> [C++][Parquet] Do not duplicate Schema metadata in Parquet schema metadata and serialized ARROW:schema value
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-14303
>                 URL: https://issues.apache.org/jira/browse/ARROW-14303
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 7.0.0
>
>
> Metadata values are being duplicated in the Parquet file footer — we should either only store them in ARROW:schema or the Parquet schema metadata. Removing them from the Parquet schema metadata may break applications that are expecting that metadata to be there when serialized from Arrow, so dropping the keys from ARROW:schema is probably a safer choice



--
This message was sent by Atlassian Jira
(v8.3.4#803005)