You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2022/10/24 15:01:00 UTC

[jira] [Commented] (ARROW-18099) [Python] Cannot create pandas categorical from table only with nulls

    [ https://issues.apache.org/jira/browse/ARROW-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623235#comment-17623235 ] 

Alessandro Molina commented on ARROW-18099:
-------------------------------------------

[~jorisvandenbossche] what is your thinking on this one? The need to be able to convert to pandas categoricals seems reasonable, I'm just not sure it semantically retains the same meaning from the point of view of missing/null values.

> [Python] Cannot create pandas categorical from table only with nulls
> --------------------------------------------------------------------
>
>                 Key: ARROW-18099
>                 URL: https://issues.apache.org/jira/browse/ARROW-18099
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>         Environment: OSX 12.6
> M1 silicon
>            Reporter: Damian Barabonkov
>            Priority: Minor
>              Labels: python-conversion
>
> A pyarrow Table with only null values cannot be instantiated as a Pandas DataFrame with said column as a category. However, pandas does support "empty" categoricals. Therefore, a simple patch would be to load the pa.Table as an object first and convert, once in pandas, to a categorical which will be empty. However, that does not solve the pyarrow bug at its root.
>  
> Sample reproducible example
> {code:java}
> import pyarrow as pa
> pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, '__index_level_0__': 3}]
> tbl = pa.Table.from_pylist(pylist)
>  
> # Errors
> df_broken = tbl.to_pandas(categories=["x"])
>  
> # Works
> df_works = tbl.to_pandas()
> df_works = df_works.astype({"x": "category"}) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)