You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/10 19:24:31 UTC

[GitHub] [arrow-datafusion] WinkerDu commented on a diff in pull request #2183: MINOR: use arrow kernel `take` to avoid value copy in `string_concat`

WinkerDu commented on code in PR #2183:
URL: https://github.com/apache/arrow-datafusion/pull/2183#discussion_r846830459


##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -430,17 +431,17 @@ fn string_concat(left: ArrayRef, right: ArrayRef) -> Result<ArrayRef> {
         scalar_value => scalar_value.into_array(left.clone().len()),
     };
     let ignore_null_array = ignore_null.as_any().downcast_ref::<StringArray>().unwrap();
-    let result = (0..ignore_null_array.len())
+    let index_array = (0..ignore_null_array.len())
         .into_iter()
         .map(|index| {
             if left.is_null(index) || right.is_null(index) {
                 None
             } else {
-                Some(ignore_null_array.value(index))
+                Some(index as u32)
             }
         })
-        .collect::<StringArray>();
-
+        .collect::<UInt32Array>();
+    let result = take(ignore_null_array, &index_array, None)?;

Review Comment:
   I think we can optimize the whole string concat process to avoid this value copying, something like:
   
   - original process: build-in `concat` ignoring `NULL` -> generate validity array or array of indexes -> take valid value from `concat` output array according to bitmap or indexes
   - opmized process: concat two input string array well handled with `NULL`, no value copy any more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org