You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/13 13:37:48 UTC

[GitHub] [arrow] pitrou commented on a diff in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

pitrou commented on code in PR #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r849479300


##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -97,8 +97,16 @@
 #include "arrow/util/logging.h"
 
 // For filename conversion
-#if defined(_WIN32)
+#ifdef _WIN32

Review Comment:
   This does not really fall into the "filename conversion" category, can you move these additions up?



##########
cpp/src/arrow/util/tracing_internal.cc:
##########
@@ -18,6 +18,10 @@
 #include "arrow/util/tracing_internal.h"
 #include "arrow/util/tracing.h"
 
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/sysinfo.h>

Review Comment:
   Two things: 1) can we use the C++ includes where available (e.g. `<cstdio>` instead of `<stdio.h`>)? 2) can we keep the system includes alphabetically ordered?



##########
cpp/src/arrow/compute/exec/aggregate_node.cc:
##########
@@ -171,14 +171,16 @@ class ScalarAggregateNode : public ExecNode {
     START_SPAN(span, "Consume",
                {{"aggregate", ToStringExtra()},
                 {"node.label", label()},
-                {"batch.length", batch.length}});
+                {"batch.length", batch.length},
+                GET_MEMORY_POOL_INFO});

Review Comment:
   Instead of adding this by hand everywhere, can we define for example a `START_COMPUTE_SPAN` macro that does it automatically?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1867,5 +1875,41 @@ uint64_t GetOptionalThreadId() {
   return (tid == 0) ? tid - 1 : tid;
 }
 
+// Returns the current resident set size (physical memory use) measured
+// in bytes, or zero if the value cannot be determined on this OS.
+int64_t GetCurrentRSS() {
+#if defined(_WIN32)
+  // Windows --------------------------------------------------
+  PROCESS_MEMORY_COUNTERS info;
+  GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info));
+  return static_cast<int64_t>(info.WorkingSetSize);
+
+#elif defined(__APPLE__)
+  // OSX ------------------------------------------------------
+  struct mach_task_basic_info info;
+  mach_msg_type_number_t infoCount = MACH_TASK_BASIC_INFO_COUNT;
+  if (task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &infoCount) !=
+      KERN_SUCCESS)
+    return (int64_t)0L;
+  return (int64_t)info.resident_size;
+
+#elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__)
+  // Linux ----------------------------------------------------
+  int64_t rss = 0L;
+  FILE* fp = NULL;
+  if ((fp = fopen("/proc/self/statm", "r")) == NULL) return static_cast<int64_t>(0L);

Review Comment:
   Same here: log the error?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1867,5 +1875,41 @@ uint64_t GetOptionalThreadId() {
   return (tid == 0) ? tid - 1 : tid;
 }
 
+// Returns the current resident set size (physical memory use) measured
+// in bytes, or zero if the value cannot be determined on this OS.
+int64_t GetCurrentRSS() {
+#if defined(_WIN32)
+  // Windows --------------------------------------------------
+  PROCESS_MEMORY_COUNTERS info;
+  GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info));
+  return static_cast<int64_t>(info.WorkingSetSize);
+
+#elif defined(__APPLE__)
+  // OSX ------------------------------------------------------
+  struct mach_task_basic_info info;
+  mach_msg_type_number_t infoCount = MACH_TASK_BASIC_INFO_COUNT;
+  if (task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &infoCount) !=
+      KERN_SUCCESS)
+    return (int64_t)0L;
+  return (int64_t)info.resident_size;
+
+#elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__)
+  // Linux ----------------------------------------------------
+  int64_t rss = 0L;

Review Comment:
   Should be `long` for `%ld`. But perhaps it would be better to use `std::fstream`, so that you can avoid the `fscanf` call?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1867,5 +1875,41 @@ uint64_t GetOptionalThreadId() {
   return (tid == 0) ? tid - 1 : tid;
 }
 
+// Returns the current resident set size (physical memory use) measured
+// in bytes, or zero if the value cannot be determined on this OS.
+int64_t GetCurrentRSS() {
+#if defined(_WIN32)
+  // Windows --------------------------------------------------
+  PROCESS_MEMORY_COUNTERS info;
+  GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info));
+  return static_cast<int64_t>(info.WorkingSetSize);
+
+#elif defined(__APPLE__)
+  // OSX ------------------------------------------------------
+  struct mach_task_basic_info info;
+  mach_msg_type_number_t infoCount = MACH_TASK_BASIC_INFO_COUNT;
+  if (task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &infoCount) !=
+      KERN_SUCCESS)
+    return (int64_t)0L;

Review Comment:
   Can we log the error using e.g. `ARROW_LOG(WARNING)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org