You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/13 15:01:08 UTC

[GitHub] [arrow] mqy opened a new pull request #9193: ARROW-11239: [Rust] Demonstrate unit test failure on array::transform::tests::test_struct [WILL BE CLOSED]

mqy opened a new pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   I just got the message from @alamb that we are closing to the 3.0.0 release, that's great!  So this is the gift :)
   
   This PR shows a possible bug that I failed to fix in the past hours.  So would you please have a check and fix.
   I had checked the tests for string, they are looking good.  I guess perhaps something goes wrong when `MutableArrayData` extends the null bits.
   
   Ping @alamb @jorgecarleitao @nevi-me


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558865481



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -219,17 +219,15 @@ impl ArrayData {
     ///
     /// Panics if `offset + length > self.len()`.
     pub fn slice(&self, offset: usize, length: usize) -> ArrayData {
-        assert!((offset + length) <= self.len());
-
-        let mut new_data = self.clone();
-
-        new_data.len = length;
-        new_data.offset = offset + self.offset;
-
-        new_data.null_count =
-            count_nulls(new_data.null_buffer(), new_data.offset, new_data.len);
+        self.copy_range(offset, length)
+    }
 
-        new_data
+    fn copy_range(&self, offset: usize, length: usize) -> ArrayData {
+        assert!((offset + length) <= self.len());
+        let arrays = vec![self];

Review comment:
       Perhaps this hints the right direction: 
   - Possible performance speedup because no longer clone all array data and use only part of it (think about the total length is large and the slice length very small)
   - Make the resulting ArrayData self-contained, thus less chances to mystery bugs. In this case, perhaps it's better to rename `slice` as `copy_range` to avoid mis-understanding.
   
   The problem is: there are still 5 failures due to "not yet implemented" or "not supported"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9193: ARROW-11239: [Rust] Demonstrate unit test failure on array::transform::tests::test_struct [WILL BE CLOSED]

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#issuecomment-759606870


   https://issues.apache.org/jira/browse/ARROW-11239


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558238739



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -446,10 +446,37 @@ mod tests {
     }
 
     #[test]
-    fn test_equality() {
-        let int_data = ArrayData::builder(DataType::Int32).build();
-        let float_data = ArrayData::builder(DataType::Float32).build();
-        assert_ne!(int_data, float_data);
+    //#[should_panic(expected = "index out of bounds: the len is 0 but the index is 0")]
+    fn test_slice_equality() {
+        let mut bit_v: [u8; 1] = [0; 1];
+        bit_util::set_bit(&mut bit_v, 1);
+        let data = ArrayData::builder(DataType::Int32)
+            .len(2)
+            .null_bit_buffer(Buffer::from(bit_v))
+            .build();
+        let data_slice = data.slice(1, 1);
+
+        let mut bit_v: [u8; 1] = [0; 1];
+        bit_util::set_bit(&mut bit_v, 0);
+        let arc_data = ArrayData::builder(DataType::Int32)
+            .len(1)
+            .null_bit_buffer(Buffer::from(bit_v))
+            .build();
+        let expected = arc_data.as_ref();
+
+        // thread 'array::data::tests::test_slice_equality' panicked at 'index out of bounds:
+        // the len is 0 but the index is 0', arrow/src/array/equal/primitive.rs:36:23
+        assert_eq!(&data_slice, expected);
+
+        // I've got some background messages from the following PRs:
+        // - https://github.com/apache/arrow/pull/8200
+        // - https://github.com/apache/arrow/pull/8541
+        // - https://github.com/apache/arrow/pull/8590
+        //
+        // My thoughts:
+        // 1. don't clone all data, because it may be great waste when the length is far
+        //    smaller than the total data length (think about 1 vs 10k).
+        // 2. construct buffers from logic data.

Review comment:
       This is not `zero-copy` any more, and the whole behavior is no longer match the semantics of `slicing` -- sounds like `copy arrange`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9193: ARROW-11239: [Rust] Demonstrate unit test failure on array::transform::tests::test_struct [WILL BE CLOSED]

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#issuecomment-759533660


   I am aware of this, but I was unable to work on it. I can only work on this over this weekend :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558238739



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -446,10 +446,37 @@ mod tests {
     }
 
     #[test]
-    fn test_equality() {
-        let int_data = ArrayData::builder(DataType::Int32).build();
-        let float_data = ArrayData::builder(DataType::Float32).build();
-        assert_ne!(int_data, float_data);
+    //#[should_panic(expected = "index out of bounds: the len is 0 but the index is 0")]
+    fn test_slice_equality() {
+        let mut bit_v: [u8; 1] = [0; 1];
+        bit_util::set_bit(&mut bit_v, 1);
+        let data = ArrayData::builder(DataType::Int32)
+            .len(2)
+            .null_bit_buffer(Buffer::from(bit_v))
+            .build();
+        let data_slice = data.slice(1, 1);
+
+        let mut bit_v: [u8; 1] = [0; 1];
+        bit_util::set_bit(&mut bit_v, 0);
+        let arc_data = ArrayData::builder(DataType::Int32)
+            .len(1)
+            .null_bit_buffer(Buffer::from(bit_v))
+            .build();
+        let expected = arc_data.as_ref();
+
+        // thread 'array::data::tests::test_slice_equality' panicked at 'index out of bounds:
+        // the len is 0 but the index is 0', arrow/src/array/equal/primitive.rs:36:23
+        assert_eq!(&data_slice, expected);
+
+        // I've got some background messages from the following PRs:
+        // - https://github.com/apache/arrow/pull/8200
+        // - https://github.com/apache/arrow/pull/8541
+        // - https://github.com/apache/arrow/pull/8590
+        //
+        // My thoughts:
+        // 1. don't clone all data, because it may be great waste when the length is far
+        //    smaller than the total data length (think about 1 vs 10k).
+        // 2. construct buffers from logic data.

Review comment:
       This is not `zero-copy` any more, and the whole behavior is no longer match the semantics of `slicing` -- sounds like `copy arrange`. Looks like this is what `MutableArrayData` does.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#issuecomment-760706954


   Both ARROW-11239 and code of this PR were updated. It looks like the failures are relevant to `ArrayData::slice()`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Show some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558885348



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -219,17 +219,15 @@ impl ArrayData {
     ///
     /// Panics if `offset + length > self.len()`.
     pub fn slice(&self, offset: usize, length: usize) -> ArrayData {
-        assert!((offset + length) <= self.len());
-
-        let mut new_data = self.clone();
-
-        new_data.len = length;
-        new_data.offset = offset + self.offset;
-
-        new_data.null_count =
-            count_nulls(new_data.null_buffer(), new_data.offset, new_data.len);
+        self.copy_range(offset, length)
+    }
 
-        new_data
+    fn copy_range(&self, offset: usize, length: usize) -> ArrayData {
+        assert!((offset + length) <= self.len());
+        let arrays = vec![self];

Review comment:
       > Possible performance speedup
   
   Totally wrong, the benchmark of `array_slice` shows 4x - 11x regression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558784766



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -123,11 +123,11 @@ impl TryFrom<Vec<(&str, ArrayRef)>> for StructArray {
                 len = Some(child_datum_len)
             }
             child_data.push(child_datum.clone());
-            fields.push(Field::new(
-                field_name,
-                array.data_type().clone(),
-                child_datum.null_buffer().is_some(),
-            ));
+            let null_buffer = child_datum.null_buffer();

Review comment:
       https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11263




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #9193: ARROW-11239: [Rust] Demonstrate unit test failure on array::transform::tests::test_struct [WILL BE CLOSED]

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558238739



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -446,10 +446,37 @@ mod tests {
     }
 
     #[test]
-    fn test_equality() {
-        let int_data = ArrayData::builder(DataType::Int32).build();
-        let float_data = ArrayData::builder(DataType::Float32).build();
-        assert_ne!(int_data, float_data);
+    //#[should_panic(expected = "index out of bounds: the len is 0 but the index is 0")]
+    fn test_slice_equality() {
+        let mut bit_v: [u8; 1] = [0; 1];
+        bit_util::set_bit(&mut bit_v, 1);
+        let data = ArrayData::builder(DataType::Int32)
+            .len(2)
+            .null_bit_buffer(Buffer::from(bit_v))
+            .build();
+        let data_slice = data.slice(1, 1);
+
+        let mut bit_v: [u8; 1] = [0; 1];
+        bit_util::set_bit(&mut bit_v, 0);
+        let arc_data = ArrayData::builder(DataType::Int32)
+            .len(1)
+            .null_bit_buffer(Buffer::from(bit_v))
+            .build();
+        let expected = arc_data.as_ref();
+
+        // thread 'array::data::tests::test_slice_equality' panicked at 'index out of bounds:
+        // the len is 0 but the index is 0', arrow/src/array/equal/primitive.rs:36:23
+        assert_eq!(&data_slice, expected);
+
+        // I've got some background messages from the following PRs:
+        // - https://github.com/apache/arrow/pull/8200
+        // - https://github.com/apache/arrow/pull/8541
+        // - https://github.com/apache/arrow/pull/8590
+        //
+        // My thoughts:
+        // 1. don't clone all data, because it may be great waste when the length is far
+        //    smaller than the total data length (think about 1 vs 10k).
+        // 2. construct buffers from logic data.

Review comment:
       This is not `zero-copy` any more, and the whole behavior is no longer match the semantics of `slicing` -- sounds like `copy arrange`. Looks like this is what `MutableArrayData` does.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Show some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558885348



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -219,17 +219,15 @@ impl ArrayData {
     ///
     /// Panics if `offset + length > self.len()`.
     pub fn slice(&self, offset: usize, length: usize) -> ArrayData {
-        assert!((offset + length) <= self.len());
-
-        let mut new_data = self.clone();
-
-        new_data.len = length;
-        new_data.offset = offset + self.offset;
-
-        new_data.null_count =
-            count_nulls(new_data.null_buffer(), new_data.offset, new_data.len);
+        self.copy_range(offset, length)
+    }
 
-        new_data
+    fn copy_range(&self, offset: usize, length: usize) -> ArrayData {
+        assert!((offset + length) <= self.len());
+        let arrays = vec![self];

Review comment:
       > Possible performance speedup
   Totally wrong, the benchmark of `array_slice` shows 4x - 11x regression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#issuecomment-760706954


   Both ARROW-11239 and code of this PR were updated.
   https://github.com/apache/arrow/pull/9193/commits/b81176852536b66838ab33b706a0e404dc8765a6 added two tests to show the failures


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#issuecomment-761552645


   e20ee33 added new tests that passed locally.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #9193: ARROW-11239: [Rust] Show some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#issuecomment-760706954


   Both ARROW-11239 and code of this PR were updated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #9193:
URL: https://github.com/apache/arrow/pull/9193


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9193: ARROW-11239: [Rust] Some unit test failures

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9193:
URL: https://github.com/apache/arrow/pull/9193#discussion_r558804008



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -219,17 +219,17 @@ impl ArrayData {
     ///
     /// Panics if `offset + length > self.len()`.
     pub fn slice(&self, offset: usize, length: usize) -> ArrayData {
-        assert!((offset + length) <= self.len());
-
-        let mut new_data = self.clone();
-
-        new_data.len = length;
-        new_data.offset = offset + self.offset;
+        self.copy_range(offset, length)

Review comment:
       This is the attempt to implement slice with MutableArrayData, but triggers ARROW-11263




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org