You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Rose (Jira)" <ji...@apache.org> on 2022/06/15 17:04:00 UTC

[jira] [Created] (ARROW-16838) Schema inference for pandas extension dtypes fails on indexes

Ian Rose created ARROW-16838:
--------------------------------

             Summary: Schema inference for pandas extension dtypes fails on indexes
                 Key: ARROW-16838
                 URL: https://issues.apache.org/jira/browse/ARROW-16838
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 8.0.0
            Reporter: Ian Rose


Hi! pa.Schema.from_pandas called on a dataframe whose index is a pandas extension dtype (e.g., string[python]) results in an error:

{code:python}
import pyarrow as pa
df = pd.DataFrame({"a": [1, 2]}, index=pd.Index(["A", "B"], dtype="string"))
pa.Schema.from_pandas(df)

{code}

produces

{code:python}
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_1827952/3691394220.py in <module>
      1 import pyarrow as pa
      2 df = pd.DataFrame({"a": [1, 2]}, index=pd.Index(["A", "B"], dtype="string"))
----> 3 pa.Schema.from_pandas(df)

~/miniconda3/envs/dask/lib/python3.8/site-packages/pyarrow/types.pxi in pyarrow.lib.Schema.from_pandas()

~/miniconda3/envs/dask/lib/python3.8/site-packages/pyarrow/pandas_compat.py in dataframe_to_types(df, preserve_index, columns)
    527             type_ = pa.array(c, from_pandas=True).type
    528         elif _pandas_api.is_extension_array_dtype(values):
--> 529             type_ = pa.array(c.head(0), from_pandas=True).type
    530         else:
    531             values, type_ = get_datetimetz_type(values, c.dtype, None)

AttributeError: 'Index' object has no attribute 'head'

{code}

If I remove the `head` call, or convert the index to a series manually, things work.

Reported downstream in https://github.com/dask/dask/issues/9186

Related issue from a couple of years ago: https://issues.apache.org/jira/browse/ARROW-8159
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)