You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Thomas Buhrmann (JIRA)" <ji...@apache.org> on 2018/06/14 17:47:00 UTC

[jira] [Created] (ARROW-2711) [Python/C++] Pandas-Arrow doesn't roundtrip when column of lists has empty first element

Thomas Buhrmann created ARROW-2711:
--------------------------------------

             Summary: [Python/C++] Pandas-Arrow doesn't roundtrip when column of lists has empty first element
                 Key: ARROW-2711
                 URL: https://issues.apache.org/jira/browse/ARROW-2711
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Thomas Buhrmann


Hi, I thought this had been fixed in the past, but this simple use case still breaks:

 
{code:java}
df = pd.DataFrame(dict(x=[[], ["a"]]))
tbl = pyarrow.Table.from_pandas(df)
print(tbl.schema)
{code}
results in a wrong inferred type of "list<item: null>":

 
{noformat}
x: list<item: null>
  child 0, item: null
__index_level_0__: int64
metadata
--------
{b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes": [{"na'
            b'me": null, "field_name": null, "pandas_type": "unicode", "numpy_'
            b'type": "object", "metadata": {"encoding": "UTF-8"}}], "columns":'
            b' [{"name": "x", "field_name": "x", "pandas_type": "list[empty]",'
            b' "numpy_type": "object", "metadata": null}, {"name": null, "fiel'
            b'd_name": "__index_level_0__", "pandas_type": "int64", "numpy_typ'
            b'e": "int64", "metadata": null}], "pandas_version": "0.22.0"}'}{noformat}
When converting the Table back to pandas all elements are now None too:

 
{code:java}
df2 = tbl.to_pandas()
print(df2)

       x

0     [] 
1 [None]
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)