You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Christian Lundgren (Jira)" <ji...@apache.org> on 2020/11/26 09:50:00 UTC

[jira] [Created] (ARROW-10742) [Python] Mask not checked when creating array from numpy array

Christian Lundgren created ARROW-10742:
------------------------------------------

             Summary: [Python] Mask not checked when creating array from numpy array
                 Key: ARROW-10742
                 URL: https://issues.apache.org/jira/browse/ARROW-10742
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Christian Lundgren


When creating an array from a python sequence using a mask arrow will raise an exception unless:
* mask is a numpy array
* mask is dtype is bool
* mask has same length as sequence
* mask is 1 dimensional

[https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124|https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124]

But, when creating an array from a numpy array these checks are not done which can lead to surprising results.

Example:


{code:python}
import pytest
import pyarrow as pa
import numpy as np


def test_numpy_masked():
    n = 100
    obj = np.arange(n)
    with pytest.raises(ValueError):
        arr = pa.array(obj, mask=np.array([None] * n, dtype="O"))  # wrong dtype
    with pytest.raises(ValueError):
        arr = pa.array(obj, mask=np.array([False] * (n // 2)))  # wrong length
    with pytest.raises(ValueError):
        arr = pa.array(obj, mask=np.array([False] * n, ndmin=2))  # wrong shape


def test_sequence_masked():
    n = 100
    obj = np.arange(n).tolist()
    with pytest.raises(ValueError):
        arr = pa.array(obj, mask=np.array([None] * n, dtype="O"))  # wrong dtype
    with pytest.raises(ValueError):
        arr = pa.array(obj, mask=np.array([False] * (n // 2)))  # wrong length
    with pytest.raises(ValueError):
        arr = pa.array(obj, mask=np.array([False] * n, ndmin=2))  # wrong shape


if __name__ == "__main__":
    pytest.main(args=[__file__])

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)