You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Artem KOZHEVNIKOV (Jira)" <ji...@apache.org> on 2020/11/04 10:16:00 UTC

[jira] [Created] (ARROW-10494) take silently overflow on list array (when casting to large_list is needed)

Artem KOZHEVNIKOV created ARROW-10494:
-----------------------------------------

             Summary: take silently overflow on list array (when casting to large_list is needed)
                 Key: ARROW-10494
                 URL: https://issues.apache.org/jira/browse/ARROW-10494
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 2.0.0
            Reporter: Artem KOZHEVNIKOV


reproducer below
{code:python}
import numpy as np
import pyarrow as pa
arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)])
nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
big_arr = arr.take(indices)
print(big_arr.offsets[-5:])
big_arr.validate() # hopefully this can catch it 

[
  -21,
  -16,
  -11,
  -6,
  -1
]
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-1-09503f9cbb04> in <module>
      6 big_arr = arr.take(indices)
      7 print(big_arr.offsets[-5:])
----> 8 big_arr.validate()

/opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.validate()

/opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Negative offsets in list array
{code}

and it works fine with large_array (as expected) :

{code:python}

import numpy as np
import pyarrow as pa
arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)], type=pa.large_list(pa.int8()))
nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
big_arr = arr.take(indices)
print(big_arr.offsets[-5:])
big_arr.validate()
[
  4294967275,
  4294967280,
  4294967285,
  4294967290,
  4294967295
]
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)