You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/09 12:29:26 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1787: Arbitrary size concat elements utf8

tustvold commented on code in PR #1787:
URL: https://github.com/apache/arrow-rs/pull/1787#discussion_r893438397


##########
arrow/src/compute/kernels/concat_elements.rs:
##########
@@ -29,54 +29,75 @@ use crate::error::{ArrowError, Result};
 ///
 ///   ["Hello"] + ["World"] = ["HelloWorld"]
 ///
-///   ["a", "b"] + [None, "c"] = [None, "bc"]
+///   ["a", "b"] + [None, "c"] + [None, "d"] = [None, "bcd"]
 /// ```
 ///
-/// An error will be returned if `left` and `right` have different lengths
+/// An error will be returned if the [`StringArray`] are of different lengths.
 pub fn concat_elements_utf8<Offset: OffsetSizeTrait>(
-    left: &GenericStringArray<Offset>,
-    right: &GenericStringArray<Offset>,
+    arrays: &[&GenericStringArray<Offset>],
 ) -> Result<GenericStringArray<Offset>> {
-    if left.len() != right.len() {
+    if arrays.is_empty() {
+        return Err(ArrowError::ComputeError(
+            "concat requires input of at least one array".to_string(),
+        ));
+    }
+
+    let size = arrays[0].len();
+    if !arrays.iter().all(|array| array.len() == size) {
         return Err(ArrowError::ComputeError(format!(
-            "Arrays must have the same length: {} != {}",
-            left.len(),
-            right.len()
+            "Arrays must have the same length of {}",
+            size,
         )));
     }
 
-    let output_bitmap = combine_option_bitmap(&[left.data(), right.data()], left.len())?;
-
-    let left_offsets = left.value_offsets();
-    let right_offsets = right.value_offsets();
-
-    let left_buffer = left.value_data();
-    let right_buffer = right.value_data();
-    let left_values = left_buffer.as_slice();
-    let right_values = right_buffer.as_slice();
+    let output_bitmap = combine_option_bitmap(
+        arrays
+            .iter()
+            .map(|a| a.data())
+            .collect::<Vec<_>>()
+            .as_slice(),
+        size,
+    )?;
+
+    let data_buffers = arrays
+        .iter()
+        .map(|array| array.value_data())
+        .collect::<Vec<_>>();
+
+    let data_values = data_buffers
+        .iter()
+        .map(|buffer| buffer.as_slice())
+        .collect::<Vec<_>>();
+
+    let mut offsets = arrays
+        .iter()
+        .map(|a| a.value_offsets().iter().peekable())
+        .collect::<Vec<_>>();
 
     let mut output_values = BufferBuilder::<u8>::new(
-        left_values.len() + right_values.len()
-            - left_offsets[0].to_usize().unwrap()
-            - right_offsets[0].to_usize().unwrap(),
+        data_values
+            .iter()
+            .zip(offsets.iter_mut())
+            .map(|(data, offset)| data.len() - offset.peek().unwrap().to_usize().unwrap())

Review Comment:
   Yes, offsets can be empty. Should be a simple case of adding something like this at the top of the function
   
   ```
   if arrays[0].is_empty() {
     return make_empty(arrays[0].data_type())
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org