You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Damian Barabonkov (Jira)" <ji...@apache.org> on 2022/10/19 13:34:00 UTC
[jira] [Updated] (ARROW-18099) Cannot create pandas categorical from table only with nulls
[ https://issues.apache.org/jira/browse/ARROW-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damian Barabonkov updated ARROW-18099:
--------------------------------------
Description:
A pyarrow Table with only null values cannot be instantiated as a Pandas DataFrame with said column as a category. However, pandas does support "empty" categoricals. Therefore, a simple patch would be to load the pa.Table as an object first and convert, once in pandas, to a categorical which will be empty. However, that does not solve the pyarrow bug at its root.
Sample reproducible example
{code:java}
import pyarrow as pa
pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, '__index_level_0__': 3}]
tbl = pa.Table.from_pylist(pylist)
# Errors
df_broken = tbl.to_pandas(categories=["x"])
# Works
df_works = tbl.to_pandas()
df_works = df_works.astype({"x": "category"}) {code}
was:
A pyarrow Table with only null values cannot be instantiated as a Pandas DataFrame with said column as a category. However, pandas does support "empty" categoricals. Therefore, a simple patch would be to load the pa.Table as an object first and convert, once in pandas, to a categorical which will be empty. However, that does not solve the pyarrow bug at its root.
Sample reproducible example
{code:java}
import pyarrow as pa
pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, '__index_level_0__': 3}]
tbl = pa.Table.from_pylist(pylist)
#Errors
df_broken = tbl.to_pandas(categories=["x"])
#Works
df_works = tbl.to_pandas()
df_works = df_works.astype({"x": "category"}) {code}
> Cannot create pandas categorical from table only with nulls
> -----------------------------------------------------------
>
> Key: ARROW-18099
> URL: https://issues.apache.org/jira/browse/ARROW-18099
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 9.0.0
> Environment: OSX 12.6
> M1 silicon
> Reporter: Damian Barabonkov
> Priority: Minor
>
> A pyarrow Table with only null values cannot be instantiated as a Pandas DataFrame with said column as a category. However, pandas does support "empty" categoricals. Therefore, a simple patch would be to load the pa.Table as an object first and convert, once in pandas, to a categorical which will be empty. However, that does not solve the pyarrow bug at its root.
>
> Sample reproducible example
> {code:java}
> import pyarrow as pa
> pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, '__index_level_0__': 3}]
> tbl = pa.Table.from_pylist(pylist)
>
> # Errors
> df_broken = tbl.to_pandas(categories=["x"])
>
> # Works
> df_works = tbl.to_pandas()
> df_works = df_works.astype({"x": "category"}) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)