You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Christian Lundgren (Jira)" <ji...@apache.org> on 2020/11/26 09:50:00 UTC
[jira] [Created] (ARROW-10742) [Python] Mask not checked when
creating array from numpy array
Christian Lundgren created ARROW-10742:
------------------------------------------
Summary: [Python] Mask not checked when creating array from numpy array
Key: ARROW-10742
URL: https://issues.apache.org/jira/browse/ARROW-10742
Project: Apache Arrow
Issue Type: Improvement
Reporter: Christian Lundgren
When creating an array from a python sequence using a mask arrow will raise an exception unless:
* mask is a numpy array
* mask is dtype is bool
* mask has same length as sequence
* mask is 1 dimensional
[https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124|https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124]
But, when creating an array from a numpy array these checks are not done which can lead to surprising results.
Example:
{code:python}
import pytest
import pyarrow as pa
import numpy as np
def test_numpy_masked():
n = 100
obj = np.arange(n)
with pytest.raises(ValueError):
arr = pa.array(obj, mask=np.array([None] * n, dtype="O")) # wrong dtype
with pytest.raises(ValueError):
arr = pa.array(obj, mask=np.array([False] * (n // 2))) # wrong length
with pytest.raises(ValueError):
arr = pa.array(obj, mask=np.array([False] * n, ndmin=2)) # wrong shape
def test_sequence_masked():
n = 100
obj = np.arange(n).tolist()
with pytest.raises(ValueError):
arr = pa.array(obj, mask=np.array([None] * n, dtype="O")) # wrong dtype
with pytest.raises(ValueError):
arr = pa.array(obj, mask=np.array([False] * (n // 2))) # wrong length
with pytest.raises(ValueError):
arr = pa.array(obj, mask=np.array([False] * n, ndmin=2)) # wrong shape
if __name__ == "__main__":
pytest.main(args=[__file__])
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)