You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/06 09:03:26 UTC

[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

houqp commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436250767



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -577,6 +620,78 @@ where
         self
     }
 
+    /// Appends data from other arrays into the builder
+    ///
+    /// This is most useful when concatenating arrays of the same type into a builder.
+    fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> {
+        // determine the latest offset on the builder
+        let mut cum_offset = if self.offsets_builder.len() == 0 {
+            0
+        } else {
+            // peek into buffer to get last appended offset
+            let buffer = self.offsets_builder.buffer.data();
+            let len = self.offsets_builder.len();
+            let (start, end) = ((len - 1) * 4, len * 4);
+            let slice = &buffer[start..end];
+            i32::from_le_bytes(slice.try_into().unwrap())
+        };
+        for array in data {
+            if let DataType::List(_) = array.data_type() {
+                if array.child_data().len() != 1 {
+                    return Err(ArrowError::InvalidArgumentError(
+                        "When appending list arrays, data must contain 1 child_data element"
+                            .to_string(),
+                    ));
+                }
+                let len = array.len();
+                if len == 0 {
+                    continue;
+                }
+                let offset = array.offset();
+
+                // `typed_data` is unsafe, however this call is safe as `ListArray` has i32 offsets
+                unsafe {

Review comment:
       looks like scope of this unsafe block can be reduced?

##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -450,6 +455,44 @@ impl<T: ArrowPrimitiveType> ArrayBuilder for PrimitiveBuilder<T> {
         self.values_builder.len
     }
 
+    /// Appends data from other arrays into the builder
+    ///
+    /// This is most useful when concatenating arrays of the same type into a builder.
+    fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> {
+        let mul = T::get_bit_width() / 8;
+        for array in data {
+            if array.data_type() != &T::get_data_type() {

Review comment:
       if one of the array data has the wrong type, the append will be partially applied, is this intentional?

##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder {
         self.len
     }
 
+    /// Appends data from other arrays into the builder
+    ///
+    /// This is most useful when concatenating arrays of the same type into a builder.
+    fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> {
+        for array in data {
+            if let DataType::Struct(fields) = array.data_type() {
+                if &self.fields != fields {
+                    return Err(ArrowError::InvalidArgumentError(
+                        "Struct arrays are not the same".to_string(),

Review comment:
       nitpick, perhaps `have different fields` would be a better wording here.

##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder {
         self.len
     }
 
+    /// Appends data from other arrays into the builder
+    ///
+    /// This is most useful when concatenating arrays of the same type into a builder.
+    fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> {
+        for array in data {
+            if let DataType::Struct(fields) = array.data_type() {
+                if &self.fields != fields {
+                    return Err(ArrowError::InvalidArgumentError(
+                        "Struct arrays are not the same".to_string(),
+                    ));
+                }
+                let len = array.len();
+                if len == 0 {
+                    continue;
+                }
+                let offset = array.offset();
+                let results: Result<Vec<()>> = self
+                    .field_builders
+                    .iter_mut()
+                    .zip(array.child_data())
+                    .map(|(builder, child_data)| {
+                        // slice child_data to account for offsets
+                        let child_array = make_array(child_data.clone());
+                        let sliced = child_array.slice(offset, len);
+                        builder.append_data(&[sliced.data()])
+                    })
+                    .collect();
+                results?;

Review comment:
       minor, if results is not used, then it's better to use a for loop over zipped iterators above to avoid creating a temporary  vector at the end.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org