You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/04/18 12:10:00 UTC

[jira] [Commented] (ARROW-8498) [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

    [ https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086429#comment-17086429 ] 

Joris Van den Bossche commented on ARROW-8498:
----------------------------------------------

[~uwe] fixed this recently (ARROW-8159), so will be in 0.17

> [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-8498
>                 URL: https://issues.apache.org/jira/browse/ARROW-8498
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>            Reporter: Thomas Buhrmann
>            Priority: Major
>
> While Table.from_pandas() seems to work as expected with extension types,
>  Schema.from_pandas()  raises an ArrowTypeError:
> {code:python}
> df = pd.DataFrame({
>    "x": pd.Series([1, 2, None], dtype="Int8"),
>    "y": pd.Series(["a", "b", None], dtype="category"),
>    "z": pd.Series(["ab", "bc", None], dtype="string"),
> })
> print(pa.Table.from_pandas(df).schema)
> print(pa.Schema.from_pandas(df))
> {code}
>  
> Results in:
> {noformat}
> x: int8
> y: dictionary<values=string, indices=int8, ordered=0>
> z: string
> metadata
> --------
> {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "'
>             b'stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_'
>             b'name": null, "pandas_type": "unicode", "numpy_type": "object", "'
>             b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", "f'
>             b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", "m'
>             b'etadata": null}, {"name": "y", "field_name": "y", "pandas_type":'
>             b' "categorical", "numpy_type": "int8", "metadata": {"num_categori'
>             b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", "pa'
>             b'ndas_type": "unicode", "numpy_type": "string", "metadata": null}'
>             b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, "pand'
>             b'as_version": "1.0.3"}'}
> ---------------------------------------------------------------------------
> ArrowTypeError                            Traceback (most recent call last)
> ...
> ArrowTypeError: Did not pass numpy.dtype object
> {noformat}
> I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should result in the exact same object?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)