You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/04/18 14:55:09 UTC

[GitHub] [arrow] felipecrv commented on a diff in pull request #35098: GH-35097: [C++] ArrayData support for child_data slice.

felipecrv commented on code in PR #35098:
URL: https://github.com/apache/arrow/pull/35098#discussion_r1170157451


##########
cpp/src/arrow/array/data.cc:
##########
@@ -144,6 +144,8 @@ std::shared_ptr<ArrayData> ArrayData::Slice(int64_t off, int64_t len) const {
   } else {
     copy->null_count = null_count != 0 ? kUnknownNullCount : 0;
   }
+  for (auto& child : copy->child_data) 
+    child = child->Slice(copy->offset, copy->length);

Review Comment:
   @Light-City *avoiding materialization* is a common theme of query execution. Databases are often working on data that is larger than memory or taking all the memory with the dataset itself. Allocating more memory to produce intermediate results is a no-no. And this tradition is honored by the Arrow design. That's why it can be seen as a bit counterintuitive for people used to array functions of common programming languages.
   
   For instance, in JavaScript, `Array.prototype.slice(begin, end)` creates a new array. In database-speak this is worded as "arr.slice() materializes the slice". An alternative implementation could return an object with a reference to the original array and the bounds of the slice to avoid memory allocation. That would require that every function you normally use to work with arrays be aware of the slice boundaries. It would complicate JavaScript too much, but for Arrow that is exactly the intended design. Every compute kernel in Arrow has to be aware of the offset and length to minimize materialization, thus minimizing memory consumption.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org