You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/04/30 18:38:00 UTC

[jira] [Commented] (ARROW-8657) [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

    [ https://issues.apache.org/jira/browse/ARROW-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096866#comment-17096866 ] 

Wes McKinney commented on ARROW-8657:
-------------------------------------

For the record, I think we need to introduce a new flag to toggle the use of newer logical types and associated casting/metadata behavior, and leave the 1.0/2.0 flag for its intended use, i.e. the DataPageV1 vs DataPageV2

So my suggested fix is:

* Add the new flag that is separate from switching version 1.0/2.0
* Revert the behavior in Python of version='2.0' to use DataPageV1, **but make a future warning to get people to use the new flag**
* In a future release (maybe 2 releases from now), {{version='2.0'}} will again write DataPageV2

> [Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8657
>                 URL: https://issues.apache.org/jira/browse/ARROW-8657
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.17.0
>            Reporter: Pierre Belzile
>            Priority: Major
>
> With the recent release of 0.17, the ParquetVersion is used to define the logical type interpretation of fields and the selection of the DataPage format.
> As a result all parquet files that were created with ParquetVersion::V2 to get features such as unsigned int32s, timestamps with nanosecond resolution, etc are now unreadable. That's TBs of data in my case.
> Those two concerns should be separated. Given that that DataPageV2 pages were not written prior to 0.17 and in order to allow reading existing files, the existing version property should continue to operate as in 0.16 and inform the logical type mapping.
> Some consideration should be given to issue a release 0.17.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)