You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/27 20:43:52 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #2194: Use BitChunks in equal_bits

alamb commented on code in PR #2194:
URL: https://github.com/apache/arrow-rs/pull/2194#discussion_r931562098


##########
arrow/src/array/equal/boolean.rs:
##########
@@ -75,20 +77,14 @@ pub(super) fn boolean_equal(
     } else {
         // get a ref of the null buffer bytes, to use in testing for nullness
         let lhs_null_bytes = lhs.null_buffer().as_ref().unwrap().as_slice();
-        let rhs_null_bytes = rhs.null_buffer().as_ref().unwrap().as_slice();
 
         let lhs_start = lhs.offset() + lhs_start;
         let rhs_start = rhs.offset() + rhs_start;
 
-        (0..len).all(|i| {
+        BitIndexIterator::new(lhs_null_bytes, lhs_start, len).all(|i| {

Review Comment:
   Is this correct? It seems like it will would not find positions where `rhs` was null but `lhs` was not. Maybe I mis understand something
   
   It seems like we need to iterate over the positions where either is set -- like `lhs_null_bytes | rhs_null_bytes`?
   
   Maybe there is a lack of test coverage 🤔 



##########
arrow/src/array/data.rs:
##########
@@ -2865,4 +2881,15 @@ mod tests {
         let err = data.validate_values().unwrap_err();
         assert_eq!(err.to_string(), "Invalid argument error: Offset invariant failure: offset at position 1 out of bounds: 3 > 2");
     }
+
+    #[test]
+    fn test_contains_nulls() {
+        let buffer: Buffer =

Review Comment:
   I wonder if using a buffer with more than 64 bits is important (or is that already well enough covered in `BitSliceIterator` tests)?



##########
arrow/src/array/equal/boolean.rs:
##########
@@ -75,20 +77,14 @@ pub(super) fn boolean_equal(
     } else {
         // get a ref of the null buffer bytes, to use in testing for nullness
         let lhs_null_bytes = lhs.null_buffer().as_ref().unwrap().as_slice();
-        let rhs_null_bytes = rhs.null_buffer().as_ref().unwrap().as_slice();
 
         let lhs_start = lhs.offset() + lhs_start;
         let rhs_start = rhs.offset() + rhs_start;
 
-        (0..len).all(|i| {
+        BitIndexIterator::new(lhs_null_bytes, lhs_start, len).all(|i| {

Review Comment:
   Or maybe you can just call `equal_bits` here?



##########
arrow/src/array/data.rs:
##########
@@ -37,6 +38,21 @@ use std::sync::Arc;
 
 use super::equal::equal;
 
+#[inline]
+pub(crate) fn contains_nulls(

Review Comment:
   Given the potential usefulness (and non obviousness) of using `BitSliceIterator` to check for nulls, I wonder what you think about making this a function on `BitSliceIterator` such as `BitSliceIterator::contains_only_unset_bits` or something? 



##########
arrow/src/array/data.rs:
##########
@@ -37,6 +38,21 @@ use std::sync::Arc;
 
 use super::equal::equal;
 
+#[inline]
+pub(crate) fn contains_nulls(

Review Comment:
   Given the never ending confusion about "is this a size in bits or bytes" I recommend making it explicit -- either add a docstring that says `offset` and `len` are in `bits` or perhaps name them `bit_offset` and `bit_len`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org