You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Bryan Cutler (Jira)" <ji...@apache.org> on 2020/11/06 23:57:00 UTC

[jira] [Created] (ARROW-10512) [Python] Arrow to Pandas conversion promotes child array to float for NULL values

Bryan Cutler created ARROW-10512:
------------------------------------

             Summary: [Python] Arrow to Pandas conversion promotes child array to float for NULL values
                 Key: ARROW-10512
                 URL: https://issues.apache.org/jira/browse/ARROW-10512
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Bryan Cutler


When converting a nested Arrow array to Pandas, if a child array is an integer type with NULL values, it gets promoted to floating point and NULL values are replaced with NaNs. Since the Pandas conversion for these types results in Python objects, it is not necessary to use NaN and `None` values could be inserted instead. This is the case for ListType, MapType and StructType, etc.

{code}
In [4]: s = pd.Series([[1, 2, 3], [4, 5, None]])

In [5]: arr = pa.Array.from_pandas(s)

In [6]: arr.type
Out[6]: ListType(list<item: int64>)

In [7]: arr.to_pandas()
Out[7]: 
0    [1.0, 2.0, 3.0]
1    [4.0, 5.0, nan]
dtype: object {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)