You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/22 12:39:43 UTC

[GitHub] [arrow] nevi-me opened a new pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child

nevi-me opened a new pull request #8739:
URL: https://github.com/apache/arrow/pull/8739


   This change aligns with the spec, in that a struct array's nulls should take precedence over a child array's nulls.
   We carry across the parent's null buffer, and merge it with the child's.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r532032623



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            equal_range(
+                lhs_values,
+                rhs_values,
+                lhs_merged_nulls.as_ref(),
+                rhs_merged_nulls.as_ref(),
+                lhs_start,
+                rhs_start,
+                len,
+            )
         })
 }
 
 pub(super) fn struct_equal(
     lhs: &ArrayData,
     rhs: &ArrayData,
+    lhs_nulls: Option<&Buffer>,
+    rhs_nulls: Option<&Buffer>,
     lhs_start: usize,
     rhs_start: usize,
     len: usize,
 ) -> bool {
-    if lhs.null_count() == 0 && rhs.null_count() == 0 {
-        equal_values(lhs, rhs, lhs_start, rhs_start, len)
+    // we have to recalculate null counts from the null bitmaps
+    let lhs_null_count = count_nulls(lhs_nulls, lhs_start, len);
+    let rhs_null_count = count_nulls(rhs_nulls, rhs_start, len);
+    if lhs_null_count == 0 && rhs_null_count == 0 {
+        equal_values(lhs, rhs, lhs_nulls, rhs_nulls, lhs_start, rhs_start, len)
     } else {
         // with nulls, we need to compare item by item whenever it is not null
         (0..len).all(|i| {

Review comment:
       @nevi-me  I think I was confused -- I was trying to offer some way to avoid @jorgecarleitao 's concern of double merge concerns but now that I re-read my comments I am not sure that they make any sense. Sorry for the confusion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r528333746



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            equal_range(
+                lhs_values,
+                rhs_values,
+                lhs_merged_nulls.as_ref(),
+                rhs_merged_nulls.as_ref(),
+                lhs_start,
+                rhs_start,
+                len,
+            )
         })
 }
 
 pub(super) fn struct_equal(
     lhs: &ArrayData,
     rhs: &ArrayData,
+    lhs_nulls: Option<&Buffer>,
+    rhs_nulls: Option<&Buffer>,
     lhs_start: usize,
     rhs_start: usize,
     len: usize,
 ) -> bool {
-    if lhs.null_count() == 0 && rhs.null_count() == 0 {
-        equal_values(lhs, rhs, lhs_start, rhs_start, len)
+    // we have to recalculate null counts from the null bitmaps
+    let lhs_null_count = count_nulls(lhs_nulls, lhs_start, len);
+    let rhs_null_count = count_nulls(rhs_nulls, rhs_start, len);
+    if lhs_null_count == 0 && rhs_null_count == 0 {
+        equal_values(lhs, rhs, lhs_nulls, rhs_nulls, lhs_start, rhs_start, len)
     } else {
         // with nulls, we need to compare item by item whenever it is not null
         (0..len).all(|i| {

Review comment:
       Wouldn't it be easier if we took into account the nulls on this statement here:
   
   ```rust
   let lhs_is_null = lhs.is_null(lhs_pos);
   let rhs_is_null = rhs.is_null(rhs_pos);
   ```
   
   by replacing it with
   
   ```rust
   let lhs_is_null = lhs.is_null(lhs_pos) && lhs.nullbitmap.isnull(lhs_pos);
   let rhs_is_null = rhs.is_null(rhs_pos) && rhs.nullbitmap.isnull(rhs_pos);
   ```
   
   (still keeping the re-count in place, to avoid wrongfully optimizing like this PR fixes)
   
   The advantage is that for all types except `struct`, this PR merges two equal bitmaps twice (all types except `struct` now pass `null_buffer()` to `equal_range`. E.g. in `list`:
   
   ```
               lhs_values,
               rhs_values,
               lhs_values.null_buffer(),
               rhs_values.null_buffer(),
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r529030845



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {

Review comment:
       I don't know how much `c.clone()` costs below, but if we want to avoid `clone()` you could potentially use a structure like this:
   
   ```diff
   alamb@MacBook-Pro rust % git diff
   git diff
   WARNING: terminal is not fully functional
   -  (press RETURN)
   diff --git a/rust/arrow/src/array/equal/structure.rs b/rust/arrow/src/array/equal/structure.rs
   index cad7a9ea5..dc29d44ba 100644
   --- a/rust/arrow/src/array/equal/structure.rs
   +++ b/rust/arrow/src/array/equal/structure.rs
   @@ -28,6 +28,9 @@ fn equal_values(
        rhs_start: usize,
        len: usize,
    ) -> bool {
   +    let mut temp_lhs : Option<Buffer> = None;
   +    let mut temp_rhs : Option<Buffer> = None;
   +
        lhs.child_data()
            .iter()
            .zip(rhs.child_data())
   @@ -35,27 +38,29 @@ fn equal_values(
                // merge the null data
                let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
                    (None, None) => None,
   -                (None, Some(c)) => Some(c.clone()),
   -                (Some(p), None) => Some(p.clone()),
   +                (None, Some(c)) => Some(c),
   +                (Some(p), None) => Some(p),
                    (Some(p), Some(c)) => {
                        let merged = (p & c).unwrap();
   -                    Some(merged)
   +                    temp_lhs = Some(merged);
   +                    temp_lhs.as_ref()
                    }
                };
                let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
                    (None, None) => None,
   -                (None, Some(c)) => Some(c.clone()),
   -                (Some(p), None) => Some(p.clone()),
   +                (None, Some(c)) => Some(c),
   +                (Some(p), None) => Some(p),
                    (Some(p), Some(c)) => {
                        let merged = (p & c).unwrap();
   -                    Some(merged)
   +                    temp_rhs = Some(merged);
   +                    temp_rhs.as_ref()
                    }
                };
                equal_range(
                    lhs_values,
                    rhs_values,
   -                lhs_merged_nulls.as_ref(),
   -                rhs_merged_nulls.as_ref(),
   +                lhs_merged_nulls,
   +                rhs_merged_nulls,
                    lhs_start,
                    rhs_start,
                    len,
   ```
   
   (Neville edited this add diff formatting to the code)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#issuecomment-731743163


   @jorgecarleitao @alamb I've only implemented this for `<struct<primitive>>` for now, I'd like to get some feedback on whether the approach that I'm taking is fine.
   
   The change mainly keeps track of the null buffer of `ArrayData` for arrays, and combines the array's null buffer with the child's, when a nested type is encountered. I use `BitAnd` to combine the buffers.
   
   Where we need null counts for comparisons, we have to recalculate the counts from the separate null buffer, as they might have been combined with the parent's null buffer (buffer-ception? let me know if my sentence is unclear).
   
   I'll work on the below so we have sufficient test coverage:
   * `<struct<struct<primitive>>>`
   * `<struct<struct<dictionary>>>`
   * `<list<primitive>>`
   * `<list<dictionary>>`
   * `<list<struct<list<primitive>>>>`
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#issuecomment-734986453


   @alamb @jorgecarleitao this is blocking some work that I'm doing on parquet, so I'd like to limit the scope of the PR to just nested structs, as I want to submit a PR that correctly reads nested arrow structs in parquet.
   I'm also a bit unsure if the struct null inheritance semantics also apply to lists, so I'd like to defer that to follow-ups (as I'd need to do the work when writing nested arrow lists to parquet).
   
   Are you fine with reviewing the PR as is just for nested structs? I believe I've responded to or addressed comments raised so far, but I'll address more where they come, or where i haven't covered everything as yet.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#issuecomment-735224685


   > I am going to retrigger the tests on github and then plan to merge this PR if they pass
   
   @alamb something's filling up our space on CI, so the tests might still fail. I tried looking at it in the morning, but ended up parking it for later so I could do other work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8739:
URL: https://github.com/apache/arrow/pull/8739


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r531148917



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            equal_range(
+                lhs_values,
+                rhs_values,
+                lhs_merged_nulls.as_ref(),
+                rhs_merged_nulls.as_ref(),
+                lhs_start,
+                rhs_start,
+                len,
+            )
         })
 }
 
 pub(super) fn struct_equal(
     lhs: &ArrayData,
     rhs: &ArrayData,
+    lhs_nulls: Option<&Buffer>,
+    rhs_nulls: Option<&Buffer>,
     lhs_start: usize,
     rhs_start: usize,
     len: usize,
 ) -> bool {
-    if lhs.null_count() == 0 && rhs.null_count() == 0 {
-        equal_values(lhs, rhs, lhs_start, rhs_start, len)
+    // we have to recalculate null counts from the null bitmaps
+    let lhs_null_count = count_nulls(lhs_nulls, lhs_start, len);
+    let rhs_null_count = count_nulls(rhs_nulls, rhs_start, len);
+    if lhs_null_count == 0 && rhs_null_count == 0 {
+        equal_values(lhs, rhs, lhs_nulls, rhs_nulls, lhs_start, rhs_start, len)
     } else {
         // with nulls, we need to compare item by item whenever it is not null
         (0..len).all(|i| {

Review comment:
       @alamb if we pass `None` where we don't have a struct, wouldn't we be describing a different position, which might be incorrect?
   A `None` for the `Option<&Buffer>` indicates that there are no nulls, as arrays are permitted to omit the null buffer/bitmap altogether if there are no nulls. So if we pass `None` for `lhs_nulls`, we shouldn't have to look at `lhs`, as the presumption is that we've already extracted the null buffer, and there wasn't any.
   
   Perhaps I don't understand you




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#issuecomment-731741901


   https://issues.apache.org/jira/browse/ARROW-10684


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#issuecomment-735223342


   The CI failures seem unrelated
   
   For example: 
   
   https://github.com/apache/arrow/pull/8739/checks?check_run_id=1465465859
   
   ```
   test result: ok. 56 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
   
   + popd
   /d/a/arrow/arrow/rust /d/a/arrow/arrow
   + pushd arrow
   /d/a/arrow/arrow/rust/arrow /d/a/arrow/arrow/rust /d/a/arrow/arrow
   + cargo run --example builders
      Compiling arrow v3.0.0-SNAPSHOT (D:\a\arrow\arrow\rust\arrow)
   LLVM ERROR: IO failure on output stream: no space on device
   error: could not compile `arrow`
   ```
   
   I am going to retrigger the tests on github and then plan to merge this PR if they pass


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#issuecomment-735384692


   @jorgecarleitao may you please take a look at this again when you have a chance. I opted not to address lists for now, as their semantics might be fine. I'll look into other types as part of ARROW-10766, which I'll be working on in the evenings in the coming week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r531781486



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -15,45 +15,97 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use crate::array::ArrayData;
+use crate::{
+    array::data::count_nulls, array::ArrayData, buffer::Buffer, util::bit_util::get_bit,
+};
 
 use super::equal_range;
 
+/// Compares the values of two [ArrayData] starting at `lhs_start` and `rhs_start` respectively
+/// for `len` slots. The null buffers `lhs_nulls` and `rhs_nulls` inherit parent nullability.
+///
+/// If an array is a child of a struct or list, the array's nulls have to be merged with the parent.
+/// This then affects the null count of the array, thus the merged nulls are passed separately
+/// as `lhs_nulls` and `rhs_nulls` variables to functions.
+/// The nulls are merged with a bitwise AND, and null counts are recomputed wheer necessary.
 fn equal_values(
     lhs: &ArrayData,
     rhs: &ArrayData,
+    lhs_nulls: Option<&Buffer>,
+    rhs_nulls: Option<&Buffer>,
     lhs_start: usize,
     rhs_start: usize,
     len: usize,
 ) -> bool {
+    let mut temp_lhs: Option<Buffer> = None;
+    let mut temp_rhs: Option<Buffer> = None;
+
     lhs.child_data()
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c),
+                (Some(p), None) => Some(p),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    temp_lhs = Some(merged);
+                    temp_lhs.as_ref()
+                }
+            };
+            let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c),
+                (Some(p), None) => Some(p),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    temp_rhs = Some(merged);
+                    temp_rhs.as_ref()
+                }
+            };
+            equal_range(
+                lhs_values,
+                rhs_values,
+                lhs_merged_nulls,
+                rhs_merged_nulls,
+                lhs_start,
+                rhs_start,
+                len,
+            )
         })
 }
 
 pub(super) fn struct_equal(
     lhs: &ArrayData,
     rhs: &ArrayData,
+    lhs_nulls: Option<&Buffer>,
+    rhs_nulls: Option<&Buffer>,
     lhs_start: usize,
     rhs_start: usize,
     len: usize,
 ) -> bool {
-    if lhs.null_count() == 0 && rhs.null_count() == 0 {
-        equal_values(lhs, rhs, lhs_start, rhs_start, len)
+    // we have to recalculate null counts from the null buffers
+    let lhs_null_count = count_nulls(lhs_nulls, lhs_start, len);
+    let rhs_null_count = count_nulls(rhs_nulls, rhs_start, len);
+    if lhs_null_count == 0 && rhs_null_count == 0 {
+        equal_values(lhs, rhs, lhs_nulls, rhs_nulls, lhs_start, rhs_start, len)
     } else {
+        // get a ref of the null buffer bytes, to use in testing for nullness

Review comment:
       @jorgecarleitao does this address your comment? I was incorrectly using the array's null buffer, instead of checking nullness from the merged one. If it doesn't, then it means I didn't understand what you were suggesting.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r531155130



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {

Review comment:
       Thanks! Suggestion taken




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r531780455



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -15,13 +15,15 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use crate::array::ArrayData;
+use crate::{array::data::count_nulls, array::ArrayData, buffer::Buffer};
 
 use super::equal_range;
 
 fn equal_values(

Review comment:
       I've added the details on the same function in `crate::array::equal::mod`. I've copied the same doc from the function in `mod` to here, but I think it's redundant as the functions are private, as long as the `mod` one is sufficiently documented.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8739:
URL: https://github.com/apache/arrow/pull/8739#discussion_r529026844



##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -15,13 +15,15 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use crate::array::ArrayData;
+use crate::{array::data::count_nulls, array::ArrayData, buffer::Buffer};
 
 use super::equal_range;
 
 fn equal_values(

Review comment:
       Given the number of parameters this function has, it might be worth adding doc comments on it (especially as `lhs_nulls` and `rhs_nulls` now have non trivial semantics (nullability is checked by `AND`ing them together)

##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
+                (None, None) => None,
+                (None, Some(c)) => Some(c.clone()),
+                (Some(p), None) => Some(p.clone()),
+                (Some(p), Some(c)) => {
+                    let merged = (p & c).unwrap();
+                    Some(merged)
+                }
+            };
+            equal_range(
+                lhs_values,
+                rhs_values,
+                lhs_merged_nulls.as_ref(),
+                rhs_merged_nulls.as_ref(),
+                lhs_start,
+                rhs_start,
+                len,
+            )
         })
 }
 
 pub(super) fn struct_equal(
     lhs: &ArrayData,
     rhs: &ArrayData,
+    lhs_nulls: Option<&Buffer>,
+    rhs_nulls: Option<&Buffer>,
     lhs_start: usize,
     rhs_start: usize,
     len: usize,
 ) -> bool {
-    if lhs.null_count() == 0 && rhs.null_count() == 0 {
-        equal_values(lhs, rhs, lhs_start, rhs_start, len)
+    // we have to recalculate null counts from the null bitmaps
+    let lhs_null_count = count_nulls(lhs_nulls, lhs_start, len);
+    let rhs_null_count = count_nulls(rhs_nulls, rhs_start, len);
+    if lhs_null_count == 0 && rhs_null_count == 0 {
+        equal_values(lhs, rhs, lhs_nulls, rhs_nulls, lhs_start, rhs_start, len)
     } else {
         // with nulls, we need to compare item by item whenever it is not null
         (0..len).all(|i| {

Review comment:
       An alternate thing to do is to pass in `None` for all types except `struct`
   
   https://github.com/apache/arrow/pull/8739/files#diff-6fca2b5d7c8b47ba17e7c93c2eaa2cf7f6b65392d0573254fee593449b8adf46R43
   
   

##########
File path: rust/arrow/src/array/equal/structure.rs
##########
@@ -30,19 +32,51 @@ fn equal_values(
         .iter()
         .zip(rhs.child_data())
         .all(|(lhs_values, rhs_values)| {
-            equal_range(lhs_values, rhs_values, lhs_start, rhs_start, len)
+            // merge the null data
+            let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {

Review comment:
       I don't know how much `c.clone()` costs below, but if we want to avoid `clone()` you could potentially use a structure like this:
   
   ```
   alamb@MacBook-Pro rust % git diff
   git diff
   WARNING: terminal is not fully functional
   -  (press RETURN)
   diff --git a/rust/arrow/src/array/equal/structure.rs b/rust/arrow/src/array/equal/structure.rs
   index cad7a9ea5..dc29d44ba 100644
   --- a/rust/arrow/src/array/equal/structure.rs
   +++ b/rust/arrow/src/array/equal/structure.rs
   @@ -28,6 +28,9 @@ fn equal_values(
        rhs_start: usize,
        len: usize,
    ) -> bool {
   +    let mut temp_lhs : Option<Buffer> = None;
   +    let mut temp_rhs : Option<Buffer> = None;
   +
        lhs.child_data()
            .iter()
            .zip(rhs.child_data())
   @@ -35,27 +38,29 @@ fn equal_values(
                // merge the null data
                let lhs_merged_nulls = match (lhs_nulls, lhs_values.null_buffer()) {
                    (None, None) => None,
   -                (None, Some(c)) => Some(c.clone()),
   -                (Some(p), None) => Some(p.clone()),
   +                (None, Some(c)) => Some(c),
   +                (Some(p), None) => Some(p),
                    (Some(p), Some(c)) => {
                        let merged = (p & c).unwrap();
   -                    Some(merged)
   +                    temp_lhs = Some(merged);
   +                    temp_lhs.as_ref()
                    }
                };
                let rhs_merged_nulls = match (rhs_nulls, rhs_values.null_buffer()) {
                    (None, None) => None,
   -                (None, Some(c)) => Some(c.clone()),
   -                (Some(p), None) => Some(p.clone()),
   +                (None, Some(c)) => Some(c),
   +                (Some(p), None) => Some(p),
                    (Some(p), Some(c)) => {
                        let merged = (p & c).unwrap();
   -                    Some(merged)
   +                    temp_rhs = Some(merged);
   +                    temp_rhs.as_ref()
                    }
                };
                equal_range(
                    lhs_values,
                    rhs_values,
   -                lhs_merged_nulls.as_ref(),
   -                rhs_merged_nulls.as_ref(),
   +                lhs_merged_nulls,
   +                rhs_merged_nulls,
                    lhs_start,
                    rhs_start,
                    len,
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org