You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/24 12:21:25 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

lidavidm commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r834246278



##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       Yes, this seems it will be quite expensive. Do we need information that's this fine-grained?
   
   We could cache the values, and only update them if they're out of date. Or we could actually record the memory pool's statistics instead. Also, this information isn't really thread- or span- specific. Maybe it could be done by a background thread on a fixed schedule, especially if it's process memory and not memory pool allocations? 

##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});
     for (size_t i = 0; i < kernels_.size(); ++i) {
       util::tracing::Span span;
-      START_SPAN(span, aggs_[i].function,
-                 {{"function.name", aggs_[i].function},
-                  {"function.options",
-                   aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
-                  {"function.kind", std::string(kind_name()) + "::Consume"}});
+      START_SPAN(
+          span, aggs_[i].function,
+          {{"function.name", aggs_[i].function},
+           {"function.options",
+            aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
+           {"function.kind", std::string(kind_name()) + "::Consume"},
+           {"memory_pool_bytes", plan_->exec_context()->memory_pool()->bytes_allocated()},
+           {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+           {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       This could be done by a span processor, though that has a couple caveats:
   - Processors are configured as part of the tracing setup, by the end user application (generally not by a library),
   - IIRC, processors may not run immediately (one of the processors batches spans and processes them on a background thread, I think)
   
   I think some span recorders can also be configured to attach this sort of information as well? (Though again, the timestamps won't line up anymore.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org