You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/02/08 03:07:00 UTC

[jira] [Updated] (ARROW-3801) [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable

     [ https://issues.apache.org/jira/browse/ARROW-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-3801:
--------------------------------
    Fix Version/s: 0.13.0

> [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable
> ------------------------------------------------------------------------
>
>                 Key: ARROW-3801
>                 URL: https://issues.apache.org/jira/browse/ARROW-3801
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.10.0
>            Reporter: Thomas Buhrmann
>            Priority: Major
>             Fix For: 0.13.0
>
>
> Serializing and deserializing a pandas series with categorical dtype will make the categorical index non-writeable, which in turn trips up pandas when e.g. reordering the categories, raising "ValueError: buffer source array is read-only" :
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.Series([1,2,3], dtype='category', name="c1").to_frame()
> print("DType before:", repr(df.c1.dtype))
> print("Writeable:", df.c1.cat.categories.values.flags.writeable)
> ro = df.c1.cat.reorder_categories([3,2,1])
> print("DType reordered:", repr(ro.dtype), "\n")
> tbl = pa.Table.from_pandas(df)
> df2 = tbl.to_pandas()
> print("DType after:", repr(df2.c1.dtype))
> print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
> ro = df2.c1.cat.reorder_categories([3,2,1])
> print("DType reordered:", repr(ro.dtype), "\n")
> {code}
>  
> Outputs:
>  
> {code:java}
> DType before: CategoricalDtype(categories=[1, 2, 3], ordered=False)
> Writeable: True
> DType reordered: CategoricalDtype(categories=[3, 2, 1], ordered=False)
> DType after: CategoricalDtype(categories=[1, 2, 3], ordered=False)
> Writeable: False
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> <ipython-input-365-85b439586c1a> in <module>
>  12 print("DType after:", repr(df2.c1.dtype))
>  13 print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
> ---> 14 ro = df2.c1.cat.reorder_categories([3,2,1])
>  15 print("DType reordered:", repr(ro.dtype), "\n")
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)