You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mroeschke (via GitHub)" <gi...@apache.org> on 2023/05/01 23:21:04 UTC

[GitHub] [arrow] mroeschke opened a new issue, #35385: BUG: `pyarrow.compute.utf8_slice_codeunits` allows for float start, stop arguments

mroeschke opened a new issue, #35385:
URL: https://github.com/apache/arrow/issues/35385

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```
   In [3]: pa.__version__
   Out[3]: '11.0.0'
   
   In [4]: arr = pa.array(["abc"])
   
   In [5]: pa.compute.utf8_slice_codeunits(arr, start=1.2, stop=2.2)
   Out[5]: 
   <pyarrow.lib.StringArray object at 0x159dd67a0>
   [
     "b"
   ]
   ```
   
   I would expect this to raise a `ValueError` as the documentation says these arguments should be ints
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35385: [Python] BUG: `pyarrow.compute.utf8_slice_codeunits` allows for float start, stop arguments

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35385:
URL: https://github.com/apache/arrow/issues/35385#issuecomment-1531082881

   Although it seems if you type it with `Py_ssize_t` specifically (and not general `int` or `int64_t`), then cython will check for an integer (since that type is meant to use in indexing contexts). 
   That doesn't fully match the C++ signature though, since that explicitly uses `int64_t`, and not a platform dependent int.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35385: [Python] BUG: `pyarrow.compute.utf8_slice_codeunits` allows for float start, stop arguments

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35385:
URL: https://github.com/apache/arrow/issues/35385#issuecomment-1531062998

   The API is typed (we create a `CSliceOptions(int64_t start, int64_t stop, int64_t step)`), but I suppose that what we see is cython (or the generated C code) casting the input to int64, and thus any float gets cast/truncated. 
   
   So this will happen everywhere in our API where we rely on cython types, and are not validating the input manually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org