You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/22 10:52:16 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #34681: [C++] Array::Equals incorrectly returns true when comparing sized buffer with buffer of size 0

jorisvandenbossche opened a new issue, #34681:
URL: https://github.com/apache/arrow/issues/34681

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   From https://github.com/apache/arrow/issues/34634, which produces an invalid BooleanArray with a data buffer of size 0, when doing checking if this array is equal to another one, this incorrectly returns True:
   
   ```python
   >>> arr1 = pa.array([True, True])
   >>> arr2 = pc.replace_with_mask(pa.chunked_array([arr1]), [False, False], False).chunk(0)
   >>> arr1
   <pyarrow.lib.BooleanArray object at 0x7f8c8522f700>
   [
     true,
     true
   ]
   
   >>> arr2
   <pyarrow.lib.BooleanArray object at 0x7f8c843ce560>
   <Invalid array: Buffer #1 too small in array of type bool and length 2: expected at least 1 byte(s), got 0
   /home/joris/scipy/repos/arrow/cpp/src/arrow/array/validate.cc:116  ValidateLayout(*data.type)>
   
   >>> arr1.equals(arr2)
   True
   ```
   
   It seems it only gives true when it actually matches what the invalid array should be (so maybe we were just writing anyway to (and reading from) non-properly allocated memory?)
   
   ```python
   >>> pa.array([True, True]).equals(arr2)
   True
   >>> pa.array([True, False]).equals(arr2)
   False
   ```
   
   While we should of course fix the root cause of producing such an invalid array (https://github.com/apache/arrow/issues/34634), it also doesn't help that _if_ you for some reason have such invalid array, it's possible to have `equals` work.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34681: [C++] Array::Equals incorrectly returns true when comparing sized buffer with buffer of size 0

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34681:
URL: https://github.com/apache/arrow/issues/34681#issuecomment-1480343010

   I'm not sure we should fix this.  We could probably add more checks that would check this but those checks would be redundant and it's a slippery slope to try and handle all combinations of invalid arrays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34681: [C++] Array::Equals incorrectly returns true when comparing sized buffer with buffer of size 0

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34681:
URL: https://github.com/apache/arrow/issues/34681#issuecomment-1480737829

   I think we might actually also write into the memory (so not just garbage, which makes the test pass consistently for the example), but that's of course just as bad, and indeed when using larger data that gives a segfault at clean-up.
   
   It's true that it is a slippery slope and that you can't start with ensuring the validity of the arrays on any operation. At some point you need to assume you have valid data. 
   I suppose a more appropriate solution here would be to have better testing utilities, like an `assert_equals`-like helper, and then in such a function we could first call `validate()` on the arrays and then `equals`, to avoid such issues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34681: [C++] Array::Equals incorrectly returns true when comparing sized buffer with buffer of size 0

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34681:
URL: https://github.com/apache/arrow/issues/34681#issuecomment-1480336651

   I'm pretty certain, in this case, we are comparing values from the left array with garbage memory and getting lucky it doesn't trigger a segmentation fault.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org