You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/06 08:40:08 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

jorgecarleitao opened a new pull request #8850:
URL: https://github.com/apache/arrow/pull/8850


   This allows `join`s and any operator using the `MutableArrayData` to use columns with `StructArray`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#discussion_r537273177



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -687,4 +728,118 @@ mod tests {
         let expected = Int16Array::from(vec![Some(1), None]);
         assert_eq!(result.keys(), &expected);
     }
+
+    #[test]
+    fn test_struct() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);

Review comment:
       I am sorry, I did not notice this comment. You are right, I will address it on a separate PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#issuecomment-739473852


   https://issues.apache.org/jira/browse/ARROW-10591


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#discussion_r537007642



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -285,10 +288,16 @@ impl<'a> MutableArrayData<'a> {
     /// `use_nulls` is a flag used to optimize insertions. It should be `false` if the only source of nulls
     /// are the arrays themselves and `true` if the user plans to call [MutableArrayData::extend_nulls].
     /// In other words, if `use_nulls` is `false`, calling [MutableArrayData::extend_nulls] should not be used.
-    pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) -> Self {
+    pub fn new(arrays: Vec<&'a ArrayData>, mut use_nulls: bool, capacity: usize) -> Self {
         let data_type = arrays[0].data_type();
         use crate::datatypes::*;
 
+        // if any of the arrays has nulls, insertions from any array requires setting bits

Review comment:
       Maybe this code could move to the struct branch in the childdata match below?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#discussion_r537274054



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -285,10 +288,16 @@ impl<'a> MutableArrayData<'a> {
     /// `use_nulls` is a flag used to optimize insertions. It should be `false` if the only source of nulls
     /// are the arrays themselves and `true` if the user plans to call [MutableArrayData::extend_nulls].
     /// In other words, if `use_nulls` is `false`, calling [MutableArrayData::extend_nulls] should not be used.
-    pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) -> Self {
+    pub fn new(arrays: Vec<&'a ArrayData>, mut use_nulls: bool, capacity: usize) -> Self {
         let data_type = arrays[0].data_type();
         use crate::datatypes::*;
 
+        // if any of the arrays has nulls, insertions from any array requires setting bits

Review comment:
       Ah ok :+1: better to keep it then maybe in that case. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#discussion_r537007642



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -285,10 +288,16 @@ impl<'a> MutableArrayData<'a> {
     /// `use_nulls` is a flag used to optimize insertions. It should be `false` if the only source of nulls
     /// are the arrays themselves and `true` if the user plans to call [MutableArrayData::extend_nulls].
     /// In other words, if `use_nulls` is `false`, calling [MutableArrayData::extend_nulls] should not be used.
-    pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) -> Self {
+    pub fn new(arrays: Vec<&'a ArrayData>, mut use_nulls: bool, capacity: usize) -> Self {
         let data_type = arrays[0].data_type();
         use crate::datatypes::*;
 
+        // if any of the arrays has nulls, insertions from any array requires setting bits

Review comment:
       Maybe this code could move to the struct branch in the match below?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#discussion_r537263258



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -285,10 +288,16 @@ impl<'a> MutableArrayData<'a> {
     /// `use_nulls` is a flag used to optimize insertions. It should be `false` if the only source of nulls
     /// are the arrays themselves and `true` if the user plans to call [MutableArrayData::extend_nulls].
     /// In other words, if `use_nulls` is `false`, calling [MutableArrayData::extend_nulls] should not be used.
-    pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) -> Self {
+    pub fn new(arrays: Vec<&'a ArrayData>, mut use_nulls: bool, capacity: usize) -> Self {
         let data_type = arrays[0].data_type();
         use crate::datatypes::*;
 
+        // if any of the arrays has nulls, insertions from any array requires setting bits

Review comment:
       This change is unrelated to this PR: it was a commit from a bug fix, that this PR was build on top of. It is required also for non-struct arrays :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8850:
URL: https://github.com/apache/arrow/pull/8850#discussion_r537022972



##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -285,10 +288,16 @@ impl<'a> MutableArrayData<'a> {
     /// `use_nulls` is a flag used to optimize insertions. It should be `false` if the only source of nulls
     /// are the arrays themselves and `true` if the user plans to call [MutableArrayData::extend_nulls].
     /// In other words, if `use_nulls` is `false`, calling [MutableArrayData::extend_nulls] should not be used.
-    pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) -> Self {
+    pub fn new(arrays: Vec<&'a ArrayData>, mut use_nulls: bool, capacity: usize) -> Self {

Review comment:
       https://github.com/apache/arrow/pull/8848 appears to contain this same code -- though github seems to think merging will not be a problem

##########
File path: rust/arrow/src/array/transform/mod.rs
##########
@@ -687,4 +728,118 @@ mod tests {
         let expected = Int16Array::from(vec![Some(1), None]);
         assert_eq!(result.keys(), &expected);
     }
+
+    #[test]
+    fn test_struct() {
+        let strings: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("joe"),
+            None,
+            None,
+            Some("mark"),
+            Some("doe"),
+        ]));
+        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
+            Some(1),
+            Some(2),
+            Some(3),
+            Some(4),
+            Some(5),
+        ]));
+
+        let array =
+            StructArray::try_from(vec![("f1", strings.clone()), ("f2", ints.clone())])
+                .unwrap()
+                .data();
+        let arrays = vec![array.as_ref()];
+        let mut mutable = MutableArrayData::new(arrays, false, 0);
+
+        mutable.extend(0, 1, 3);

Review comment:
       I wonder if ensuring the slice covers an actual string value would be useful (this slice just takes the two `None`, `None` values). It it it was like `mutable.extend(0, 2, 4)` that would also include `Some("mark")`. The same comment applies to `test_struct_nulls` as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8850: ARROW-10591: [Rust] Add support for StructArray to MutableArrayData

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8850:
URL: https://github.com/apache/arrow/pull/8850


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org