You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/04/06 12:27:14 UTC
[GitHub] [arrow] jorisvandenbossche commented on issue #14991: [Python] pyarrow.compute.utf8_slice_codeunits fails when stop=None
jorisvandenbossche commented on issue #14991:
URL: https://github.com/apache/arrow/issues/14991#issuecomment-1498983117
Using my development version to get a bit more informative traceback:
```
ArrowInvalid: Negative buffer resize: -4
/home/joris/scipy/repos/arrow/cpp/src/arrow/memory_pool.cc:931 buffer->Resize(size)
/home/joris/scipy/repos/arrow/cpp/src/arrow/compute/kernels/scalar_string_internal.h:88 ctx->Allocate(max_output_ncodeunits)
/home/joris/scipy/repos/arrow/cpp/src/arrow/compute/exec.cc:920 kernel_->exec(kernel_ctx_, input, &output)
/home/joris/scipy/repos/arrow/cpp/src/arrow/compute/function.cc:276 executor->Execute(input, &listener)
```
So if `max_output_ncodeunits` is -4, we might have run into some integer overflow while calculating that value:
https://github.com/apache/arrow/blob/e2afb8cc04acec4cc14235b0973a5bc86b37d157/cpp/src/arrow/compute/kernels/scalar_string_utf8.cc#L1085-L1097
Reproducing that logic in python:
```
In [11]: import sys
In [12]: stop = np.int64(sys.maxsize)
In [13]: start = np.int64(0)
In [14]: step = np.int64(1)
In [19]: max_slice_codepoints = (stop - start + step - 1) // step
<ipython-input-19-0fd4a0c6e713>:1: RuntimeWarning: overflow encountered in scalar add
max_slice_codepoints = (stop - start + step - 1) // step
<ipython-input-19-0fd4a0c6e713>:1: RuntimeWarning: overflow encountered in scalar subtract
max_slice_codepoints = (stop - start + step - 1) // step
In [20]: max_slice_codepoints
Out[20]: 9223372036854775807
In [21]: 4 * max_slice_codepoints
<ipython-input-21-240e76cab6f7>:1: RuntimeWarning: overflow encountered in scalar multiply
4 * max_slice_codepoints
Out[21]: -4
```
So indeed multiple steps here are overflowing. We will need to refactor this calculation a bit (there are utilities like `MultiplyWithOverflow` to do overflow safe calculations that could be used here)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org