You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/08/24 20:07:00 UTC

[jira] [Resolved] (ARROW-6302) [Python][Parquet] Reading dictionary type with serialized Arrow schema does not restore "ordered" type property

     [ https://issues.apache.org/jira/browse/ARROW-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved ARROW-6302.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 5185
[https://github.com/apache/arrow/pull/5185]

> [Python][Parquet] Reading dictionary type with serialized Arrow schema does not restore "ordered" type property
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6302
>                 URL: https://issues.apache.org/jira/browse/ARROW-6302
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.0
>            Reporter: Galuh Sahid
>            Priority: Major
>              Labels: parquet, pull-request-available
>             Fix For: 0.15.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In pandas, I tried roundtripping to parquet with {{to_parquet}} and {{read_parquet}}. It preserves categorical dtypes but does not preserve their order.
> {code:python}
> import pandas as pd
> from pandas.io.parquet import read_parquet, to_parquet
> df = pd.DataFrame()
> df["a"] = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=True)
> df.to_parquet(<path>)
> actual = read_parquet(<path>)
> df["a"]    
> 0    NaN
> 1      b
> 2      c
> 3    NaN
> Name: a, dtype: category
> Categories (3, object): [b < c < d]
> actual["a"]
> 0    NaN
> 1      b
> 2      c
> 3    NaN
> Name: a, dtype: category
> Categories (3, object): [b, c, d]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)