You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/21 09:31:35 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #34634: [Python] pc.replace_with_mask produces invalid result when array is a boolean ChunkedArray

jorisvandenbossche commented on issue #34634:
URL: https://github.com/apache/arrow/issues/34634#issuecomment-1477517937

   > According [to documentation](https://arrow.apache.org/docs/python/generated/pyarrow.compute.replace_with_mask.html) the following should be fulfilled: `len(replacements) == sum(mask == true)`.
   > In this case, `len(replacements) == 2` and `sum(mask==True) == 1`.
   
   Indeed, the usage in the top-post reproducer is actually wrong, but generally we currently ignore if the `replacements` array is too long (see also https://github.com/apache/arrow/issues/32436). And indeed, also if you provide the correct length, the same issue occurs:
   
   ```
   In [12]: import pyarrow as pa
       ...: import pyarrow.compute as pc
       ...: 
       ...: arr = pa.chunked_array([[True, True]])
       ...: mask = pa.array([False, True])
       ...: replacements = pa.array([False])
   
   In [13]: pc.replace_with_mask(arr, mask, replacements)
   Out[13]: 
   <pyarrow.lib.ChunkedArray object at 0x7f027c727650>
   [
   <Invalid array: Buffer #1 too small in array of type bool and length 2: expected at least 1 byte(s), got 0
   ]
   ```
   
   The support for chunked arrays in this kernel is generally limited (see https://github.com/apache/arrow/issues/31665), but providing a chunked array for the _input_ array should now work. The above example also works if I use a different type (eg int64) instead of boolean. So this seems to be an issue specifically with boolean input array.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org