You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Artem KOZHEVNIKOV (Jira)" <ji...@apache.org> on 2020/11/04 10:21:00 UTC

[jira] [Updated] (ARROW-10494) .take silently overflow on list array (when casting to large_list is needed)

     [ https://issues.apache.org/jira/browse/ARROW-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Artem KOZHEVNIKOV updated ARROW-10494:
--------------------------------------
    Summary: .take silently overflow on list array (when casting to large_list is needed)  (was: list_array.take silently overflow on list array (when casting to large_list is needed))

> .take silently overflow on list array (when casting to large_list is needed)
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-10494
>                 URL: https://issues.apache.org/jira/browse/ARROW-10494
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 2.0.0
>            Reporter: Artem KOZHEVNIKOV
>            Priority: Major
>
> reproducer below
> {code:python}
> import numpy as np
> import pyarrow as pa
> arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)])
> nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
> indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
> big_arr = arr.take(indices)
> print(big_arr.offsets[-5:])
> big_arr.validate() # hopefully this can catch it 
> [
>   -21,
>   -16,
>   -11,
>   -6,
>   -1
> ]
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-1-09503f9cbb04> in <module>
>       6 big_arr = arr.take(indices)
>       7 print(big_arr.offsets[-5:])
> ----> 8 big_arr.validate()
> /opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.validate()
> /opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Negative offsets in list array
> {code}
> and it works fine with large_array (as expected) :
> {code:python}
> import numpy as np
> import pyarrow as pa
> arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)], type=pa.large_list(pa.int8()))
> nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
> indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
> big_arr = arr.take(indices)
> print(big_arr.offsets[-5:])
> big_arr.validate()
> [
>   4294967275,
>   4294967280,
>   4294967285,
>   4294967290,
>   4294967295
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)