You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/03/24 02:40:03 UTC

[GitHub] [arrow-rs] wjones127 opened a new pull request, #3925: feat: add take for MapArray

wjones127 opened a new pull request, #3925:
URL: https://github.com/apache/arrow-rs/pull/3925

   # Which issue does this PR close?
   
   Closes #3875
   
   # Rationale for this change
    
   This seems to be one of the only array types that isn't supported by this function.
   
   # What changes are included in this PR?
   
   Adds an easy conversion between `MapArray` and `ListArray`, which allows us to reuse kernels that support `ListArray`.
   
   # Are there any user-facing changes?
   
   No breaking changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1147061867


##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),
+                old_data.len(),
+                Some(old_data.null_count()),
+                old_data.nulls().map(|nulls|nulls.buffer().clone()),
+                old_data.offset(),
+                old_data.buffers().to_vec(),
+                old_data.child_data().to_vec(),
+            )
+        };

Review Comment:
   Is there a better way to create a copy of `ArrayData` that's the same except for the type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1148279751


##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),
+                old_data.len(),
+                Some(old_data.null_count()),
+                old_data.nulls().map(|nulls|nulls.buffer().clone()),
+                old_data.offset(),
+                old_data.buffers().to_vec(),
+                old_data.child_data().to_vec(),
+            )
+        };

Review Comment:
   Thanks, that's much better!



##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {

Review Comment:
   I removed this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3925: feat: add take for MapArray

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1148279838


##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),

Review Comment:
   I'm not sure what you meant here, but I deleted this implementation so not sure it's relevant anyways.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1147195320


##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {

Review Comment:
   ```suggestion
   impl TryFrom<ListArray> for MapArray {
   ```
   To be consistent with other array conversions, and to make the clone explicit if necessary



##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),
+                old_data.len(),
+                Some(old_data.null_count()),
+                old_data.nulls().map(|nulls|nulls.buffer().clone()),
+                old_data.offset(),
+                old_data.buffers().to_vec(),
+                old_data.child_data().to_vec(),
+            )
+        };
+        Ok(MapArray::from(array_data))
+    }
+}
+
+impl From<&MapArray> for ListArray {

Review Comment:
   ```suggestion
   impl From<MapArray> for ListArray {
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1147197754


##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),

Review Comment:
   This does potentially change the data type...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1147202853


##########
arrow-select/src/take.rs:
##########
@@ -150,6 +150,11 @@ where
                 *length as u32,
             )?))
         }
+        DataType::Map(_, _) => {
+            let list_arr = ListArray::from(values.as_map());
+            let list_data = take_list::<_, Int32Type>(&list_arr, indices)?;
+            Ok(Arc::new(MapArray::try_from(&list_data)?))

Review Comment:
   ```suggestion
            let builder = list_data.into_builder().data_type(...);
               Ok(Arc::new(MapArray::from(builder.build_unchecked())?))
   ```
   I.e. remove the try_from conversion?



##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),

Review Comment:
   This data type inference is potentially problematic, as it loses part



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3925: feat: add take for MapArray

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3925:
URL: https://github.com/apache/arrow-rs/pull/3925#discussion_r1147193311


##########
arrow-array/src/array/map_array.rs:
##########
@@ -251,6 +251,69 @@ impl std::fmt::Debug for MapArray {
     }
 }
 
+impl TryFrom<&ListArray> for MapArray {
+    type Error = ArrowError;
+    fn try_from(value: &ListArray) -> Result<Self, Self::Error> {
+        let field = match value.data_type() {
+            DataType::List(field) => {
+                if let DataType::Struct(fields) = field.data_type() {
+                    if fields.len() != 2 {
+                        Err(ArrowError::InvalidArgumentError(
+                            "List item must be a struct type with two fields".to_string(),
+                        ))
+                    } else {
+                        Ok(field)
+                    }
+                } else {
+                    Err(ArrowError::InvalidArgumentError(
+                        "List item must be a struct type to convert to a list"
+                            .to_string(),
+                    ))
+                }
+            }
+            _ => unreachable!("This should be a list type."),
+        }?;
+        let old_data = value.to_data();
+        let array_data = unsafe {
+            ArrayData::new_unchecked(
+                DataType::Map(field.clone(), false),
+                old_data.len(),
+                Some(old_data.null_count()),
+                old_data.nulls().map(|nulls|nulls.buffer().clone()),
+                old_data.offset(),
+                old_data.buffers().to_vec(),
+                old_data.child_data().to_vec(),
+            )
+        };

Review Comment:
   into_builder()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org