You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/22 15:50:28 UTC

[GitHub] [arrow-rs] tfeda commented on a diff in pull request #1720: Implementation string concat

tfeda commented on code in PR #1720:
URL: https://github.com/apache/arrow-rs/pull/1720#discussion_r878891034


##########
arrow/src/compute/kernels/concat.rs:
##########
@@ -102,6 +102,25 @@ pub fn concat(arrays: &[&dyn Array]) -> Result<ArrayRef> {
     Ok(make_array(mutable.freeze()))
 }
 
+// Elementwise concatenation of StringArrays
+pub fn string_concat<Offset: OffsetSizeTrait>(
+    left: &GenericStringArray<Offset>,
+    right: &GenericStringArray<Offset>,
+) -> Result<GenericStringArray<Offset>> {
+    let left_bitmap = left.data().null_bitmap().unwrap();
+    let right_bitmap = right.data().null_bitmap().unwrap();
+    let concat_bitmap = (left_bitmap & right_bitmap).unwrap();
+    Ok((0..left.len().max(right.len()))
+        .map(|i| {
+            if concat_bitmap.is_set(i) {
+                Some(left.value(i).to_owned() + right.value(i))
+            } else {
+                None
+            }
+        })
+        .collect::<GenericStringArray<Offset>>())

Review Comment:
   A couple thoughts, because I spent some time digging into this:
   1. @alamb's example in #1720 suggests that "valid_str" + `None` => `None`, where both implementations in this PR go for "valid_str" + `None` => "valid_str". As a user, I'd prefer the later, but I thought I'd make a note of it. If the former is chosen, then `compute::util` has [`combine_option_bitmap()`](https://github.com/apache/arrow-rs/blob/4de689598df6ea284452e687d69c7654b5a71762/arrow/src/compute/util.rs#L31) which works nicely for the null handling 
   3. How do we handle cases when concatenating two arrays results in the offsets overflowing? There should probably be a test case for that. 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org