You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/27 17:25:12 UTC

[GitHub] [arrow] bjchambers opened a new pull request #9339: ARROW-11394: [Rust] Slice & Concat

bjchambers opened a new pull request #9339:
URL: https://github.com/apache/arrow/pull/9339


   I took a stab at adding some tests for concat (and the underlying MutableArrayData) that cover the use case of the array having an offset. I had some problems with the assertions on the string array which makes me think this isn't fully fixed, but I wanted to put it up and see if anyone had suggestions for how to address this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9339: ARROW-11394: [Rust] Tests for Slice & Concat

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-770358289


   Hi @bjchambers, I think that the issue pointed to by this PR is about a bug, this I do not think that this PR fixes it. Did you created an issue for this, or should I create one and re-link this PR to it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#discussion_r565511184



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -714,6 +714,48 @@ mod tests {
         assert_eq!(array, expected)
     }
 
+    #[test]
+    fn test_struct_offset() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .slice(1, 3)
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);
+        let data = mutable.freeze();
+        let array = StructArray::from(Arc::new(data));
+
+        // Struct equality doesn't seem to work when using slices?

Review comment:
       Related to #9211 . When we `slice` an array, we increase its `offset`. However, for StructArrays, I think that we also need to increase the offset of the childs (or something like that). :(




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#discussion_r565514345



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -714,6 +714,48 @@ mod tests {
         assert_eq!(array, expected)
     }
 
+    #[test]
+    fn test_struct_offset() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .slice(1, 3)
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);
+        let data = mutable.freeze();
+        let array = StructArray::from(Arc::new(data));
+
+        // Struct equality doesn't seem to work when using slices?

Review comment:
       More specifically with @nevi-me 's analysis [here](https://github.com/apache/arrow/pull/9211#issuecomment-761743850)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bjchambers edited a comment on pull request #9339: ARROW-11394: [Rust] Tests for Slice & Concat

Posted by GitBox <gi...@apache.org>.
bjchambers edited a comment on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-770390308


   The issue pointed to is the bug I filed and the tests here were based on the test in the bug report. Why do you think this doesn't fix that bug?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-768445212


   https://issues.apache.org/jira/browse/ARROW-11394


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bjchambers commented on a change in pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
bjchambers commented on a change in pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#discussion_r565520744



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -714,6 +714,48 @@ mod tests {
         assert_eq!(array, expected)
     }
 
+    #[test]
+    fn test_struct_offset() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .slice(1, 3)
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);
+        let data = mutable.freeze();
+        let array = StructArray::from(Arc::new(data));
+
+        // Struct equality doesn't seem to work when using slices?

Review comment:
       That makes sense. It seems like it may also affect the string array. When I create the expected string directly instead of slicing it this test passes (see newest commit).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bjchambers commented on pull request #9339: ARROW-11394: [Rust] Tests for Slice & Concat

Posted by GitBox <gi...@apache.org>.
bjchambers commented on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-770392624


   Heh. No worries. Thanks for the quick review!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9339: ARROW-11394: [Rust] Tests for Slice & Concat

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9339:
URL: https://github.com/apache/arrow/pull/9339


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bjchambers commented on pull request #9339: ARROW-11394: [Rust] Tests for Slice & Concat

Posted by GitBox <gi...@apache.org>.
bjchambers commented on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-770390308


   The issue pointed to is the big I filed and the tests here were based on the test in the bug report. Why do you think this doesn't fix that bug?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-768455225


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=h1) Report
   > Merging [#9339](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=desc) (bb89b19) into [master](https://codecov.io/gh/apache/arrow/commit/ab5fc979c69ccc5dde07e1bc1467b02951b4b7e9?el=desc) (ab5fc97) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9339/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9339      +/-   ##
   ==========================================
   + Coverage   81.89%   81.92%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       52988    53093     +105     
   ==========================================
   + Hits        43392    43497     +105     
     Misses       9596     9596              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow/pull/9339/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL21vZC5ycw==) | `93.15% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/structure.rs](https://codecov.io/gh/apache/arrow/pull/9339/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3N0cnVjdHVyZS5ycw==) | `80.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow/pull/9339/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbmNhdC5ycw==) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=footer). Last update [ab5fc97...bb89b19](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bjchambers commented on a change in pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
bjchambers commented on a change in pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#discussion_r565494646



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -714,6 +714,48 @@ mod tests {
         assert_eq!(array, expected)
     }
 
+    #[test]
+    fn test_struct_offset() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .slice(1, 3)
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);
+        let data = mutable.freeze();
+        let array = StructArray::from(Arc::new(data));
+
+        // Struct equality doesn't seem to work when using slices?

Review comment:
       This has me suspicious of something not working. Specifically, using equality on the struct or the strings doesn't seem to work as I would expect.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bjchambers commented on a change in pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
bjchambers commented on a change in pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#discussion_r565526450



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -714,6 +714,48 @@ mod tests {
         assert_eq!(array, expected)
     }
 
+    #[test]
+    fn test_struct_offset() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .slice(1, 3)
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);
+        let data = mutable.freeze();
+        let array = StructArray::from(Arc::new(data));
+
+        // Struct equality doesn't seem to work when using slices?

Review comment:
       Ah. That makes sense. Thanks for the clarification!
   
   It's unrelated to this PR but I was also wondering about the implementation of extend for structs. For other types, it seems like it is relatively efficient (extending with an entire slice) but for the struct case it seems like it iterates over each row and extends a single row at a time. Is that an actual issue? Would addressing that require the changes to struct you mention?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-768455225


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=h1) Report
   > Merging [#9339](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=desc) (0b7415d) into [master](https://codecov.io/gh/apache/arrow/commit/ab5fc979c69ccc5dde07e1bc1467b02951b4b7e9?el=desc) (ab5fc97) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9339/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9339      +/-   ##
   ==========================================
   + Coverage   81.89%   81.92%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       52988    53101     +113     
   ==========================================
   + Hits        43392    43505     +113     
     Misses       9596     9596              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow/pull/9339/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL21vZC5ycw==) | `93.20% <100.00%> (+0.27%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/structure.rs](https://codecov.io/gh/apache/arrow/pull/9339/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3N0cnVjdHVyZS5ycw==) | `83.33% <100.00%> (+3.33%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow/pull/9339/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbmNhdC5ycw==) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=footer). Last update [ab5fc97...0b7415d](https://codecov.io/gh/apache/arrow/pull/9339?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9339: ARROW-11394: [Rust] Slice & Concat

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#discussion_r565522791



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -714,6 +714,48 @@ mod tests {
         assert_eq!(array, expected)
     }
 
+    #[test]
+    fn test_struct_offset() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .slice(1, 3)
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);
+        let data = mutable.freeze();
+        let array = StructArray::from(Arc::new(data));
+
+        // Struct equality doesn't seem to work when using slices?

Review comment:
       I am sorry, I was not very specific. There are two different issues: for string arrays, binary, etc., I think that the issue is solved with #9211, via https://github.com/apache/arrow/pull/9211/files#diff-74d8df3798e724950c2eb5aae1a838fc2f2b9e35d89ace2627e8e7a64584c71fR91
   
   For the Struct, we need another patch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9339: ARROW-11394: [Rust] Tests for Slice & Concat

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9339:
URL: https://github.com/apache/arrow/pull/9339#issuecomment-770391099


   @bjchambers , I am really sorry, I do not know where my head was when I concluded that. You are obviously right. Let me merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org