You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/02/08 05:18:00 UTC
[jira] [Updated] (ARROW-4350) [python] pyarrow table convert to pandas dataframe add extra information

     [ https://issues.apache.org/jira/browse/ARROW-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-4350:
--------------------------------
    Fix Version/s: 0.13.0

> [python] pyarrow table convert to pandas dataframe add extra information
> ------------------------------------------------------------------------
>
>                 Key: ARROW-4350
>                 URL: https://issues.apache.org/jira/browse/ARROW-4350
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1, 0.12.0
>            Reporter: yu peng
>            Priority: Major
>             Fix For: 0.13.0
>
>
> {code:java}
> In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})
> In [20]: df.iloc[0].to_dict()
> Out[20]: {'a': [[1], [2]], 'b': 1}
> In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
> Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}
> In [24]: np.array(df.iloc[0].to_dict()['a']).shape
> Out[24]: (2, 1)
> In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
> Out[25]: (2,)
> {code}
> Adding extra array type is not functioning as expected. 
>  
> More importantly, this would fail
>  
> {code:java}
> In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': [[1, 2],[2, 3]]})
> In [109]: df
> Out[109]:
> a b
> 0 [[1, 2], [2, 3]] [1, 2]
> 1 [[1, 2], [2, 3]] [2, 3]
> In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
> ---------------------------------------------------------------------------
> ArrowTypeError Traceback (most recent call last)
> <ipython-input-110-4a09836f807e> in <module>()
> ----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()
> 1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40>
> 1216 """
> -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays(
> 1218 df,
> 1219 schema=schema,
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 379 arrays = [convert_column(c, t)
> 380 for c, t in zip(columns_to_convert,
> --> 381 convert_types)]
> 382 else:
> 383 from concurrent import futures
> /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc in convert_column(col, ty)
> 374 e.args += ("Conversion failed for column {0!s} with type {1!s}"
> 375 .format(col.name, col.dtype),)
> --> 376 raise e
> 377
> 378 if nthreads == 1:
> ArrowTypeError: ('only size-1 arrays can be converted to Python scalars', 'Conversion failed for column a with type object')
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)