You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/29 05:26:05 UTC

[GitHub] [arrow] nevi-me commented on a change in pull request #8546: ARROW-10413: [Rust] [Parquet] Unignore some tests that are passing now

nevi-me commented on a change in pull request #8546:
URL: https://github.com/apache/arrow/pull/8546#discussion_r513983198



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -209,6 +209,61 @@ impl ArrayData {
     }
 }
 
+impl PartialEq for ArrayData {
+    fn eq(&self, other: &Self) -> bool {
+        assert_eq!(
+            self.data_type(),
+            other.data_type(),
+            "Data types not the same"
+        );
+        assert_eq!(self.len(), other.len(), "Lengths not the same");
+        // TODO: when adding tests for this, test that we can compare with arrays that have offsets
+        assert_eq!(self.offset(), other.offset(), "Offsets not the same");
+        assert_eq!(self.null_count(), other.null_count());
+        // compare buffers excluding padding
+        let self_buffers = self.buffers();
+        let other_buffers = other.buffers();
+        assert_eq!(self_buffers.len(), other_buffers.len());
+        self_buffers.iter().zip(other_buffers).for_each(|(s, o)| {
+            compare_buffer_regions(
+                s,
+                self.offset(), // TODO mul by data length
+                o,
+                other.offset(), // TODO mul by data len
+            );
+        });
+        // assert_eq!(self.buffers(), other.buffers());
+
+        assert_eq!(self.child_data(), other.child_data());
+        // null arrays can skip the null bitmap, thus only compare if there are no nulls
+        if self.null_count() != 0 || other.null_count() != 0 {
+            compare_buffer_regions(
+                self.null_buffer().unwrap(),
+                self.offset(),
+                other.null_buffer().unwrap(),
+                other.offset(),
+            )
+        }

Review comment:
       Hi @jorgecarleitao I introduced these changes in #8200 because we were failing on integration tests where:
   
   * We create an Arrow array which has no nulls from JSON, and don't allocate a null buffer (as it's optional http://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps)
   * When reading IPC files & streams, the C++ always has the null buffer allocated
   
   So without changing that to account for the above, both the Parquet roundtrips and integration tests were failing even though we were trying to compare logically correct data.
   
   On panics, there's a slight leakage of intention, in the sense that much of the equality was introduced to aid in testing, but because we don't yet have a logging mechanism, something like the below:
   
   ```rust
   let a: Int32Array = ...;
   let b: Int32Array = ..;;
   // try compare in tests
   assert_eq!(&a, &b);
   ```
   
   Would fail with a panic on the assert if the arrays are unequal, but because the comparison would return a `bool`, one wouldn't know where the failure occurred.
   So this becomes tough to see why tests are failing, and hence the practise (I'm fine with calling it bad as I still do it) of adding asserts.
   I'll work with you on #8541 to find a solution that can benefit both us when testing, and downstream users if they want a panic-free `Array:Ref:equals(other: &ArrayRef) -> bool;` kind of interface.
   
   > If it is trying to achieve logical equality, then it has a lot of issues as buffer and child equality are not that simple.
   
   I appreciate this, and the intention is to achieve logical equality even with slices. I believe that the path that I took, while still incomplete, would help us to get there.
   
   The lack of offset handling from that `TODO` could have been motivation not to merge the parquet branch in, but with all the refactoring work happening elsewhere, it's really become very inconvenient to keep rebasing and handling merge conflicts every few other days.
   
   I checked that tests were passing on both branches before the merge, and in master after the merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org