You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Galuh Sahid (Jira)" <ji...@apache.org> on 2019/08/22 03:26:00 UTC

[jira] [Commented] (ARROW-6302) [Python][Parquet] Reading dictionary type with serialized Arrow schema does not restore "ordered" type property

    [ https://issues.apache.org/jira/browse/ARROW-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912900#comment-16912900 ] 

Galuh Sahid commented on ARROW-6302:
------------------------------------

I'd like to attempt this if it's OK. 

[~jorisvandenbossche] I want to make sure, is https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader_internal.cc the only file that needs to be changed or are other files that need to be changed as well? 

> [Python][Parquet] Reading dictionary type with serialized Arrow schema does not restore "ordered" type property
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6302
>                 URL: https://issues.apache.org/jira/browse/ARROW-6302
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.0
>            Reporter: Galuh Sahid
>            Priority: Major
>              Labels: parquet
>             Fix For: 0.15.0
>
>
> In pandas, I tried roundtripping to parquet with {{to_parquet}} and {{read_parquet}}. It preserves categorical dtypes but does not preserve their order.
> {code:python}
> import pandas as pd
> from pandas.io.parquet import read_parquet, to_parquet
> df = pd.DataFrame()
> df["a"] = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=True)
> df.to_parquet(<path>)
> actual = read_parquet(<path>)
> df["a"]    
> 0    NaN
> 1      b
> 2      c
> 3    NaN
> Name: a, dtype: category
> Categories (3, object): [b < c < d]
> actual["a"]
> 0    NaN
> 1      b
> 2      c
> 3    NaN
> Name: a, dtype: category
> Categories (3, object): [b, c, d]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)