You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2019/08/20 19:30:00 UTC

[jira] [Updated] (ARROW-6302) [Python] parquet categorical support doesn't preserve order

     [ https://issues.apache.org/jira/browse/ARROW-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche updated ARROW-6302:
-----------------------------------------
    Labels: parquet  (was: )

> [Python] parquet categorical support doesn't preserve order
> -----------------------------------------------------------
>
>                 Key: ARROW-6302
>                 URL: https://issues.apache.org/jira/browse/ARROW-6302
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.0
>            Reporter: Galuh Sahid
>            Priority: Major
>              Labels: parquet
>
> In pandas, I tried roundtripping to parquet with {{to_parquet}} and {{read_parquet}}. It preserves categorical dtypes but does not preserve their order.
> {code:python}
> import pandas as pd
> from pandas.io.parquet import read_parquet, to_parquet
> df = pd.DataFrame()
> df["a"] = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=True)
> df.to_parquet(<path>)
> actual = read_parquet(<path>)
> df["a"]    
> 0    NaN
> 1      b
> 2      c
> 3    NaN
> Name: a, dtype: category
> Categories (3, object): [b < c < d]
> actual["a"]
> 0    NaN
> 1      b
> 2      c
> 3    NaN
> Name: a, dtype: category
> Categories (3, object): [b, c, d]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)