You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2018/07/07 15:29:00 UTC

[jira] [Created] (ARROW-2806) [Python] Inconsistent handling of np.nan

Uwe L. Korn created ARROW-2806:
----------------------------------

             Summary: [Python] Inconsistent handling of np.nan
                 Key: ARROW-2806
                 URL: https://issues.apache.org/jira/browse/ARROW-2806
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
            Reporter: Uwe L. Korn
             Fix For: 0.10.0


Currently we handle {{np.nan}} differently between having a list or a numpy array as an input to {{pa.array()}}:

{code}
>>> pa.array(np.array([1, np.nan]))
<pyarrow.lib.DoubleArray object at 0x11680bea8>
[
  1.0,
  nan
]

>>> pa.array([1., np.nan])
Out[9]:
<pyarrow.lib.DoubleArray object at 0x10bdacbd8>
[
  1.0,
  NA
]
{code}

I would actually think the last one is the correct one. Especially once one casts this to an integer column. There the first one produces a column with INT_MIN and the second one produces a real null.

But, in {{test_array_conversions_no_sentinel_values}} we check that {{np.nan}} does not produce a Null.

Even weirder: 

{code}
>>> df = pd.DataFrame({'a': [1., None]})
>>> df
     a
0  1.0
1  NaN
>>> pa.Table.from_pandas(df).column(0)
<Column name='a' type=DataType(double)>
chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958>
[
  1.0,
  NA
]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)