You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/01 12:12:13 UTC

[GitHub] [arrow] zagto commented on a diff in pull request #13138: ARROW-16562: Avoid slicing array inputs in ExecBatchIterator that would result in one slice

zagto commented on code in PR #13138:
URL: https://github.com/apache/arrow/pull/13138#discussion_r886730951


##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -308,16 +308,30 @@ bool ExecBatchIterator::Next(ExecBatch* batch) {
   // Now, fill the batch
   batch->values.resize(args_.size());
   batch->length = iteration_size;
-  for (size_t i = 0; i < args_.size(); ++i) {
-    if (args_[i].is_scalar()) {
-      batch->values[i] = args_[i].scalar();
-    } else if (args_[i].is_array()) {
-      batch->values[i] = args_[i].array()->Slice(position_, iteration_size);
-    } else {
-      const ChunkedArray& carr = *args_[i].chunked_array();
-      const auto& chunk = carr.chunk(chunk_indexes_[i]);
-      batch->values[i] = chunk->data()->Slice(chunk_positions_[i], iteration_size);
-      chunk_positions_[i] += iteration_size;
+
+  if (iteration_size == length_) {
+    ARROW_DCHECK_EQ(position_, 0);
+    for (size_t i = 0; i < args_.size(); ++i) {
+      if (args_[i].kind() == Datum::CHUNKED_ARRAY) {
+        const ChunkedArray& carr = *args_[i].chunked_array();
+        batch->values[i] = Datum(carr.chunk(chunk_indexes_[i])->data());
+        chunk_positions_[i] += iteration_size;
+      } else {
+        batch->values[i] = std::move(args_[i]);
+      }

Review Comment:
   I think you're right about not having to update `chunk_positions_[i]`. But the suggested change wouldn't work. Moving the chunked array Datum directly into batch->values means it ends up getting passed to the kernel, which usually expects an ARRAY type Datum. The important part why we need a branch here is extracting the ArrayData from the chunked array



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org