You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/24 07:57:19 UTC

[GitHub] [arrow] AlvinJ15 opened a new pull request #12702: ARROW-15062: [C++] Add memory information to current spams, bytes in …

AlvinJ15 opened a new pull request #12702:
URL: https://github.com/apache/arrow/pull/12702


   Add memory information to the current spans, bytes in memory_pool and RSS


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r834246278



##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       Yes, this seems it will be quite expensive. Do we need information that's this fine-grained?
   
   We could cache the values, and only update them if they're out of date. Or we could actually record the memory pool's statistics instead. Also, this information isn't really thread- or span- specific. Maybe it could be done by a background thread on a fixed schedule, especially if it's process memory and not memory pool allocations? 

##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});
     for (size_t i = 0; i < kernels_.size(); ++i) {
       util::tracing::Span span;
-      START_SPAN(span, aggs_[i].function,
-                 {{"function.name", aggs_[i].function},
-                  {"function.options",
-                   aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
-                  {"function.kind", std::string(kind_name()) + "::Consume"}});
+      START_SPAN(
+          span, aggs_[i].function,
+          {{"function.name", aggs_[i].function},
+           {"function.options",
+            aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
+           {"function.kind", std::string(kind_name()) + "::Consume"},
+           {"memory_pool_bytes", plan_->exec_context()->memory_pool()->bytes_allocated()},
+           {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+           {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       This could be done by a span processor, though that has a couple caveats:
   - Processors are configured as part of the tracing setup, by the end user application (generally not by a library),
   - IIRC, processors may not run immediately (one of the processors batches spans and processes them on a background thread, I think)
   
   I think some span recorders can also be configured to attach this sort of information as well? (Though again, the timestamps won't line up anymore.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] AlvinJ15 commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
AlvinJ15 commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r837933770



##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");
+  size_t result = -1;
+  char line[128];
+
+  while (fgets(line, 128, file) != NULL){
+    if (strncmp(line, "VmRSS:", 6) == 0){
+      result = parseLine(line);
+      break;
+    }
+  }
+  fclose(file);
+  return result*1000;
+}
+
+size_t GetMemoryUsed() {
+  size_t total_memory_size;
+  size_t used_memory_size;
+  struct sysinfo si;
+  sysinfo(&si);

Review comment:
       @westonpace I get the bytes from the memory pool in the first span, but in the next ones I extracted the statistics from the sysinfo for report the RSS, I looked the PR which you mentioned, and there the method return just the FreeRam and for calculate the Current Ram in used I still need to use sysinfo.total_memory, but after that PR get merged I can add another function for get that, and don't use system includes here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r834734593



##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       Background thread on a fixed schedule was the original ask but I think there was some concern around having two streams of telemetry information.  As a start I think only using `memory_pool_bytes` would be sufficient.  Longer term it could be handy to have some idea of the RSS of the process just to have some sense for how we are doing on fragmentation and non-pool allocations.  Since a system call is expensive I wonder if there would be a way to debounce this information and only report it at most every 10ms or on some configurable threshold.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r834739482



##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});
     for (size_t i = 0; i < kernels_.size(); ++i) {
       util::tracing::Span span;
-      START_SPAN(span, aggs_[i].function,
-                 {{"function.name", aggs_[i].function},
-                  {"function.options",
-                   aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
-                  {"function.kind", std::string(kind_name()) + "::Consume"}});
+      START_SPAN(
+          span, aggs_[i].function,
+          {{"function.name", aggs_[i].function},
+           {"function.options",
+            aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
+           {"function.kind", std::string(kind_name()) + "::Consume"},
+           {"memory_pool_bytes", plan_->exec_context()->memory_pool()->bytes_allocated()},
+           {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+           {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       Given that these memory statistics are process-wide anyways maybe it makes sense to only enable collection of memory statistics in our own profiling tools.  For example, when running the query tester we can add a span processor that collects memory statistics.
   
   Then the actual library code would only instrument things specific to the library.
   
   That only leaves the batching concern but, glancing at the description, and knowing nothing of the OT spec, I think the span processor itself might batch its output but its input (the span start/stop calls) won't be batched.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r837987266



##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");
+  size_t result = -1;
+  char line[128];
+
+  while (fgets(line, 128, file) != NULL){
+    if (strncmp(line, "VmRSS:", 6) == 0){
+      result = parseLine(line);
+      break;
+    }
+  }
+  fclose(file);
+  return result*1000;
+}
+
+size_t GetMemoryUsed() {
+  size_t total_memory_size;
+  size_t used_memory_size;
+  struct sysinfo si;
+  sysinfo(&si);

Review comment:
       Ah, yes, I see now that you are grabbing all three numbers.  `GetMemoryUsed` is less interesting than `GetMemoryUsedByProcess` in my opinion but I don't think it would hurt to report it.
   
   > but after that PR get merged I can add another function for get that, and don't use system includes here.
   
   That PR that I linked to will not be merged.  We ended up pursuing a different approach.  So we don't want to wait on it.  However, you're welcome to copy the approach that was being taken for this method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12702: ARROW-15062: [C++] Add memory information to current spams, bytes in …

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#issuecomment-1077338527






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r834730496



##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");
+  size_t result = -1;
+  char line[128];
+
+  while (fgets(line, 128, file) != NULL){
+    if (strncmp(line, "VmRSS:", 6) == 0){
+      result = parseLine(line);
+      break;
+    }
+  }
+  fclose(file);
+  return result*1000;
+}
+
+size_t GetMemoryUsed() {
+  size_t total_memory_size;
+  size_t used_memory_size;
+  struct sysinfo si;
+  sysinfo(&si);

Review comment:
       There was a portable implementation as part of a PR here: https://github.com/apache/arrow/pull/11426/files#diff-5d7d9a549780da2ec4baba08a43db3e1d524b70d6e8ae4f29cc8bfe831357f9cR178-R199 
   
   We didn't end up pursuing the PR but you might be able to borrow some from that particular function.
   
   That being said, I wonder if it would be more valuable to report `default_memory_pool()->bytes_allocated()`.  `sysinfo` is going to give us statistics on the entire server and `freeram` is sometimes misleading (e.g. if there are a lot of dirty pages then `freeram` could be high while at the same time we might be thrashing swap).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spams, bytes in …

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r834065668



##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){

Review comment:
       We try to write C++ code not C code, so this function should take a `util::string_view` IMHO.

##########
File path: cpp/src/arrow/util/tracing_internal.h
##########
@@ -48,6 +48,12 @@ namespace tracing {
 ARROW_EXPORT
 opentelemetry::trace::Tracer* GetTracer();
 
+ARROW_EXPORT
+size_t GetMemoryUsed();
+
+ARROW_EXPORT
+size_t GetMemoryUsedByProcess();

Review comment:
       These functions should go into `arrow/util/io_util.{h,cc}` instead.

##########
File path: cpp/src/arrow/util/tracing_internal.h
##########
@@ -48,6 +48,12 @@ namespace tracing {
 ARROW_EXPORT
 opentelemetry::trace::Tracer* GetTracer();
 
+ARROW_EXPORT
+size_t GetMemoryUsed();
+
+ARROW_EXPORT
+size_t GetMemoryUsedByProcess();

Review comment:
       Please return `int64_t` instead.

##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");
+  size_t result = -1;
+  char line[128];
+
+  while (fgets(line, 128, file) != NULL){
+    if (strncmp(line, "VmRSS:", 6) == 0){
+      result = parseLine(line);
+      break;
+    }
+  }
+  fclose(file);
+  return result*1000;
+}
+
+size_t GetMemoryUsed() {
+  size_t total_memory_size;
+  size_t used_memory_size;
+  struct sysinfo si;
+  sysinfo(&si);

Review comment:
       This is unfortunately Linux-specific, we'll need a portable implementation (is there a third-party library that we can use to make our life easier?).

##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");
+  size_t result = -1;
+  char line[128];
+
+  while (fgets(line, 128, file) != NULL){
+    if (strncmp(line, "VmRSS:", 6) == 0){
+      result = parseLine(line);
+      break;
+    }
+  }
+  fclose(file);
+  return result*1000;

Review comment:
       Are you sure it's not 1024, rather?

##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");

Review comment:
       1) You'll need to check for errors here; 2) would it be easier to use `std::fstream`?

##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});
     for (size_t i = 0; i < kernels_.size(); ++i) {
       util::tracing::Span span;
-      START_SPAN(span, aggs_[i].function,
-                 {{"function.name", aggs_[i].function},
-                  {"function.options",
-                   aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
-                  {"function.kind", std::string(kind_name()) + "::Consume"}});
+      START_SPAN(
+          span, aggs_[i].function,
+          {{"function.name", aggs_[i].function},
+           {"function.options",
+            aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"},
+           {"function.kind", std::string(kind_name()) + "::Consume"},
+           {"memory_pool_bytes", plan_->exec_context()->memory_pool()->bytes_allocated()},
+           {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+           {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       We should certainly avoid repeating this everywhere, and instead find a way to factor out those common span attributes (perhaps using a dedicated macro?).

##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       These calls are potentially costly since they will be issueing system calls (and perhaps even read a file). I'm not sure this is desirable. @lidavidm What do you think?

##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!

Review comment:
       Which one? The one returned by the function?

##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!
+  FILE* file = fopen("/proc/self/status", "r");

Review comment:
       Also, this is probably Linux-specific...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] AlvinJ15 commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
AlvinJ15 commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r837028159



##########
File path: cpp/src/arrow/util/tracing_internal.cc
##########
@@ -184,6 +188,41 @@ opentelemetry::trace::Tracer* GetTracer() {
   return tracer.get();
 }
 
+int parseLine(char* line){
+  // This assumes that a digit will be found and the line ends in " Kb".
+  int i = strlen(line);
+  const char* p = line;
+  while (*p <'0' || *p > '9') p++;
+  line[i-3] = '\0';
+  i = atoi(p);
+  return i;
+}
+
+size_t GetMemoryUsedByProcess() { //Note: this value is in KB!

Review comment:
       The value obtained from the system. This function was re-implemented and documented in the io_utils.h

##########
File path: cpp/src/arrow/util/tracing_internal.h
##########
@@ -48,6 +48,12 @@ namespace tracing {
 ARROW_EXPORT
 opentelemetry::trace::Tracer* GetTracer();
 
+ARROW_EXPORT
+size_t GetMemoryUsed();
+
+ARROW_EXPORT
+size_t GetMemoryUsedByProcess();

Review comment:
       moved




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] AlvinJ15 commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
AlvinJ15 commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r837931262



##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       @westonpace I can create the background thread for report the `system RAM`, but where can I place it? inside the memory pool constructor?  and using the same exporter or creating a new one just for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r837984168



##########
File path: cpp/src/arrow/compute/exec/aggregate_node.cc
##########
@@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode {
 
   Status DoConsume(const ExecBatch& batch, size_t thread_index) {
     util::tracing::Span span;
-    START_SPAN(span, "Consume",
-               {{"aggregate", ToStringExtra()},
-                {"node.label", label()},
-                {"batch.length", batch.length}});
+    START_SPAN(
+        span, "Consume",
+        {{"aggregate", ToStringExtra()},
+         {"node.label", label()},
+         {"batch.length", batch.length},
+         {"memory_pool_bytes",
+          plan_->exec_context()->memory_pool()->bytes_allocated()},
+         {"memory_used", arrow::internal::tracing::GetMemoryUsed()},
+         {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}});

Review comment:
       To be clear, I am not asking for a background thread in that comment.  It is ok either way.  I have no preference either way.  I only mentioned the background thread in the comment to explain the context.
   
   In both cases (background thread or span processor) I now think we want to move the initialization into the query testing tool (which unfortunately, is not yet merged).  This means the feature will not be enabled for library users but I think that is ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org