You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/01 04:56:25 UTC

[GitHub] [arrow] edponce commented on a change in pull request #12055: ARROW-11989: [C++][Python] Improve ChunkedArray's complexity for the access of elements

edponce commented on a change in pull request #12055:
URL: https://github.com/apache/arrow/pull/12055#discussion_r840229428



##########
File path: cpp/src/arrow/chunked_array.cc
##########
@@ -147,13 +148,15 @@ bool ChunkedArray::ApproxEquals(const ChunkedArray& other,
 }
 
 Result<std::shared_ptr<Scalar>> ChunkedArray::GetScalar(int64_t index) const {
-  for (const auto& chunk : chunks_) {
-    if (index < chunk->length()) {
-      return chunk->GetScalar(index);
-    }
-    index -= chunk->length();
+  if (!chunk_resolver_) {
+    chunk_resolver_ = internal::make_unique<internal::ChunkResolver>(chunks_);

Review comment:
       For ChunkedArrays with large number of Arrays, there would be a noticeable overhead when creating the offsets for the ChunkResolver. If the application does not accesses the data multiple times (multiple `GetScalar()`), then the overhead may outweigh its potential benefits. On the other hand, using the lazy approach requires synchronization mechanisms.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org