You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/12/01 10:36:00 UTC
[jira] [Assigned] (ARROW-10742) [Python] Mask not checked when
creating array from numpy array
[ https://issues.apache.org/jira/browse/ARROW-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche reassigned ARROW-10742:
---------------------------------------------
Assignee: Christian Lundgren
> [Python] Mask not checked when creating array from numpy array
> --------------------------------------------------------------
>
> Key: ARROW-10742
> URL: https://issues.apache.org/jira/browse/ARROW-10742
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Christian Lundgren
> Assignee: Christian Lundgren
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When creating an array from a python sequence using a mask arrow will raise an exception unless:
> * mask is a numpy array
> * mask is dtype is bool
> * mask has same length as sequence
> * mask is 1 dimensional
> [https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124|https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124]
> But, when creating an array from a numpy array these checks are not done which can lead to surprising results.
> Example:
> {code:python}
> import pytest
> import pyarrow as pa
> import numpy as np
> def test_numpy_masked():
> # This test fails, because no exceptions are raised
> n = 100
> obj = np.arange(n)
> with pytest.raises(ValueError):
> arr = pa.array(obj, mask=np.array([None] * n, dtype="O")) # wrong dtype
> with pytest.raises(ValueError):
> arr = pa.array(obj, mask=np.array([False] * (n // 2))) # wrong length
> with pytest.raises(ValueError):
> arr = pa.array(obj, mask=np.array([False] * n, ndmin=2)) # wrong shape
> def test_sequence_masked():
> # This test passes, since exceptions are raised as expected
> n = 100
> obj = np.arange(n).tolist()
> with pytest.raises(ValueError):
> arr = pa.array(obj, mask=np.array([None] * n, dtype="O")) # wrong dtype
> with pytest.raises(ValueError):
> arr = pa.array(obj, mask=np.array([False] * (n // 2))) # wrong length
> with pytest.raises(ValueError):
> arr = pa.array(obj, mask=np.array([False] * n, ndmin=2)) # wrong shape
> if __name__ == "__main__":
> pytest.main(args=[__file__])
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)