You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/12/08 08:32:00 UTC

[jira] [Resolved] (ARROW-10742) [Python] Mask not checked when creating array from numpy array

     [ https://issues.apache.org/jira/browse/ARROW-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche resolved ARROW-10742.
-------------------------------------------
    Resolution: Fixed

Issue resolved by pull request 8775
[https://github.com/apache/arrow/pull/8775]

> [Python] Mask not checked when creating array from numpy array
> --------------------------------------------------------------
>
>                 Key: ARROW-10742
>                 URL: https://issues.apache.org/jira/browse/ARROW-10742
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Christian Lundgren
>            Assignee: Christian Lundgren
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When creating an array from a python sequence using a mask arrow will raise an exception unless:
> * mask is a numpy array
> * mask is dtype is bool
> * mask has same length as sequence
> * mask is 1 dimensional
> [https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124|https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124]
> But, when creating an array from a numpy array these checks are not done which can lead to surprising results.
> Example:
> {code:python}
> import pytest
> import pyarrow as pa
> import numpy as np
> def test_numpy_masked():
>     # This test fails, because no exceptions are raised
>     n = 100
>     obj = np.arange(n)
>     with pytest.raises(ValueError):
>         arr = pa.array(obj, mask=np.array([None] * n, dtype="O"))  # wrong dtype
>     with pytest.raises(ValueError):
>         arr = pa.array(obj, mask=np.array([False] * (n // 2)))  # wrong length
>     with pytest.raises(ValueError):
>         arr = pa.array(obj, mask=np.array([False] * n, ndmin=2))  # wrong shape
> def test_sequence_masked():
>     # This test passes, since exceptions are raised as expected
>     n = 100
>     obj = np.arange(n).tolist()
>     with pytest.raises(ValueError):
>         arr = pa.array(obj, mask=np.array([None] * n, dtype="O"))  # wrong dtype
>     with pytest.raises(ValueError):
>         arr = pa.array(obj, mask=np.array([False] * (n // 2)))  # wrong length
>     with pytest.raises(ValueError):
>         arr = pa.array(obj, mask=np.array([False] * n, ndmin=2))  # wrong shape
> if __name__ == "__main__":
>     pytest.main(args=[__file__])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)