You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/08/20 21:53:00 UTC
[jira] [Updated] (ARROW-6302) [Python][Parquet] Reading dictionary
type with serialized Arrow schema does not restore "ordered" type property
[ https://issues.apache.org/jira/browse/ARROW-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-6302:
--------------------------------
Summary: [Python][Parquet] Reading dictionary type with serialized Arrow schema does not restore "ordered" type property (was: [Python] parquet categorical support doesn't preserve order)
> [Python][Parquet] Reading dictionary type with serialized Arrow schema does not restore "ordered" type property
> ---------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-6302
> URL: https://issues.apache.org/jira/browse/ARROW-6302
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.0
> Reporter: Galuh Sahid
> Priority: Major
> Labels: parquet
> Fix For: 0.15.0
>
>
> In pandas, I tried roundtripping to parquet with {{to_parquet}} and {{read_parquet}}. It preserves categorical dtypes but does not preserve their order.
> {code:python}
> import pandas as pd
> from pandas.io.parquet import read_parquet, to_parquet
> df = pd.DataFrame()
> df["a"] = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=True)
> df.to_parquet(<path>)
> actual = read_parquet(<path>)
> df["a"]
> 0 NaN
> 1 b
> 2 c
> 3 NaN
> Name: a, dtype: category
> Categories (3, object): [b < c < d]
> actual["a"]
> 0 NaN
> 1 b
> 2 c
> 3 NaN
> Name: a, dtype: category
> Categories (3, object): [b, c, d]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)