You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Vadym Zhernovyi (Jira)" <ji...@apache.org> on 2021/08/16 10:44:00 UTC

[jira] [Created] (ARROW-13632) [Python] Filter mask is always applied to elements at the begging of FixedSizeListArray when filtering a slice

Vadym Zhernovyi created ARROW-13632:
---------------------------------------

             Summary: [Python] Filter mask is always applied to elements at the begging of FixedSizeListArray when filtering a slice
                 Key: ARROW-13632
                 URL: https://issues.apache.org/jira/browse/ARROW-13632
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 5.0.0
         Environment: Windows 10, Python 3.9
            Reporter: Vadym Zhernovyi


When calling FixedSizeListArray.filter for a slice, it is always applied to the first (len(slice)) elements at the begging of the array which a slice is created from.
* The issue doesn't reproduce for ListArray. 
* a particular mask doesn't matter
* slice length and position doesn't matter
* a number of elements filtered at wrong position is always equal to a length of a slice
* the issues is not reproduced with [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
* a type of data (int32, float, ...) doesn't matter
{code:python}
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC v.1916 64 bit (AMD64)] on win32
>>> import numpy as np
>>> import pyarrow as pa
>>> np.__version__
'1.21.1'
>>> pa.__version__
'5.0.0'
>>> data = [
    np.zeros(3, dtype='int32'),
    np.ones(3, dtype='int32'),
    np.ones(3, dtype='int32') + 1,
    np.ones(3, dtype='int32') + 2,
    np.ones(3, dtype='int32') + 3,
    np.ones(3, dtype='int32') + 4,
    np.ones(3, dtype='int32') + 5,
    np.ones(3, dtype='int32') + 6
	]
a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray
>>> a.filter(pa.array(len(a) * [True]))  # everything is ok 
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3],
  [4, 4, 4],
  [5, 5, 5],
  [6, 6, 6],
  [7, 7, 7]
]
>>> a[3:7].filter(pa.array(4 * [True]))  # outputs filtered element of a[0:3] instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3]
]
>>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered element of a[0:3] instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
[
  [0, 0, 0],
  [2, 2, 2]
]
>>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element of a[0:3] instead of a[4:]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3]
]
>>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] instead of a[4:6]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
[
  [0, 0, 0],
  [1, 1, 1]
]
>>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True])) # ListArray slice filtering works ok
<pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
[
  [3, 3, 3],
  [4, 4, 4],
  [5, 5, 5],
  [6, 6, 6]
]
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)