You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/21 10:07:57 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #34634: [C++] "replace_with_mask" kernel produces invalid result (wrong validity bitmap) when array is a boolean ChunkedArray

jorisvandenbossche commented on issue #34634:
URL: https://github.com/apache/arrow/issues/34634#issuecomment-1477566537

   The same issue happens for the fill_forward/backward kernels:
   
   ```
   In [37]: pc.fill_null_forward(pa.array([True, False, None]))
   Out[37]: 
   <pyarrow.lib.BooleanArray object at 0x7f0276f33820>
   [
     true,
     false,
     false
   ]
   
   In [38]: pc.fill_null_forward(pa.chunked_array([[True, False, None]]))
   Out[38]: 
   <pyarrow.lib.ChunkedArray object at 0x7f0276ff56c0>
   [
   <Invalid array: Buffer #1 too small in array of type bool and length 3: expected at least 1 byte(s), got 0
   /home/joris/scipy/repos/arrow/cpp/src/arrow/array/validate.cc:116  ValidateLayout(*data.type)>
   ]
   ```
   
   Looking at the implementation, I think the issue is with pre-allocating the result arrays for fixed width types. This uses `->byte_width()`, but that only works for fixed width types with at least 1 byte per element, while boolean uses bits, and so this `->byte_width()` returns 0 (basically that should never be used if the type can be boolean):
   
   https://github.com/apache/arrow/blob/69118b2d26f2fbbbe65ad30dbe167d74b70fe791/cpp/src/arrow/compute/kernels/vector_replace.cc#L424-L426
   
   So in case of boolean arrays, we are allocating here a data buffer of length 0.
   
   ---
   
   @lukemanley or @pstorozenko if someone of you would like to try fixing this, I would be happy to provide some guidance 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org