You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/13 00:35:55 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #12702: ARROW-15062: [C++] Add memory information to current spans

westonpace commented on code in PR #12702:
URL: https://github.com/apache/arrow/pull/12702#discussion_r848961300


##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -97,10 +97,32 @@
 #include "arrow/util/logging.h"
 
 // For filename conversion
-#if defined(_WIN32)
+#ifdef _WIN32
 #include "arrow/util/utf8.h"
 #endif
 
+#ifdef _WIN32
+#include <psapi.h>
+#include <windows.h>
+
+#elif __unix__ || __unix || unix || (__APPLE__ && __MACH__)
+#include <sys/resource.h>
+
+#if __APPLE__ && __MACH__
+#include <mach/mach.h>
+
+#elif (_AIX || __TOS__AIX__) || (__sun__ || __sun || sun && (_SVR4 || __svr4__))
+#include <procfs.h>

Review Comment:
   Does this branch actually get used in the implementation?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -97,10 +97,32 @@
 #include "arrow/util/logging.h"
 
 // For filename conversion
-#if defined(_WIN32)
+#ifdef _WIN32
 #include "arrow/util/utf8.h"
 #endif
 
+#ifdef _WIN32
+#include <psapi.h>
+#include <windows.h>
+
+#elif __unix__ || __unix || unix || (__APPLE__ && __MACH__)
+#include <sys/resource.h>
+
+#if __APPLE__ && __MACH__

Review Comment:
   We use `#if __APPLE__` quite a bit in the code base.  However, we never use `__MACH__`.  Do you know what this is adding that `__APPLE__` doesn't already specify?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -97,10 +97,32 @@
 #include "arrow/util/logging.h"
 
 // For filename conversion
-#if defined(_WIN32)
+#ifdef _WIN32
 #include "arrow/util/utf8.h"
 #endif
 
+#ifdef _WIN32
+#include <psapi.h>
+#include <windows.h>
+
+#elif __unix__ || __unix || unix || (__APPLE__ && __MACH__)
+#include <sys/resource.h>

Review Comment:
   I'm not sure this include is used anywhere.



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1867,5 +1889,42 @@ uint64_t GetOptionalThreadId() {
   return (tid == 0) ? tid - 1 : tid;
 }
 
+// Returns the current resident set size (physical memory use) measured
+// in bytes, or zero if the value cannot be determined on this OS.
+// See: https://stackoverflow.com/a/14927379
+int64_t GetCurrentRSS() {
+#if defined(_WIN32)
+  // Windows --------------------------------------------------
+  PROCESS_MEMORY_COUNTERS info;
+  GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info));
+  return (int64_t)info.WorkingSetSize;

Review Comment:
   ```suggestion
     return static_cast<int64_t>(info.WorkingSetSize);
   ```
   For consistency with the rest of the codebase here (and elsewhere in this file) we should prefer C++ style casts:
   



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -97,10 +97,32 @@
 #include "arrow/util/logging.h"
 
 // For filename conversion
-#if defined(_WIN32)
+#ifdef _WIN32
 #include "arrow/util/utf8.h"
 #endif
 
+#ifdef _WIN32
+#include <psapi.h>
+#include <windows.h>
+
+#elif __unix__ || __unix || unix || (__APPLE__ && __MACH__)
+#include <sys/resource.h>
+
+#if __APPLE__ && __MACH__
+#include <mach/mach.h>
+
+#elif (_AIX || __TOS__AIX__) || (__sun__ || __sun || sun && (_SVR4 || __svr4__))
+#include <procfs.h>
+
+#elif __linux__ || __linux || linux || __gnu_linux__
+#include <stdio.h>
+
+#endif
+
+#else
+#error "Cannot define getPeakRSS( ) or getCurrentRSS( ) for an unknown OS."

Review Comment:
   Can we just use a dummy implementation in this case that returns 0 rather than prevent compilation?  Actually, it looks like that might be what you do already so can you just remove this error?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1867,5 +1889,42 @@ uint64_t GetOptionalThreadId() {
   return (tid == 0) ? tid - 1 : tid;
 }
 
+// Returns the current resident set size (physical memory use) measured
+// in bytes, or zero if the value cannot be determined on this OS.
+// See: https://stackoverflow.com/a/14927379

Review Comment:
   In general we try to avoid directly copy/pasting chunks of code from SO for licensing reasons but I don't know that we ever got an authoritative answer for this.  Either way, I think things are paraphrased enough by this point it isn't a concern.



##########
r/src/arrowExports.cpp:
##########
@@ -7879,12 +7879,12 @@ return Rf_ScalarLogical(
 );
 }
 static const R_CallMethodDef CallEntries[] = {
-{ "_arrow_available", (DL_FUNC)& _arrow_available, 0 },
-{ "_dataset_available", (DL_FUNC)& _dataset_available, 0 },
-{ "_engine_available", (DL_FUNC)& _engine_available, 0 },
-{ "_parquet_available", (DL_FUNC)& _parquet_available, 0 },
-{ "_s3_available", (DL_FUNC)& _s3_available, 0 },
-{ "_json_available", (DL_FUNC)& _json_available, 0 },
+		{ "_arrow_available", (DL_FUNC)& _arrow_available, 0 },

Review Comment:
   I'm not sure these changes are related.  Maybe a rebase will get them to go away?



##########
cpp/src/arrow/util/io_util.h:
##########
@@ -346,5 +346,11 @@ int64_t GetRandomSeed();
 ARROW_EXPORT
 uint64_t GetThreadId();
 
+/// \brief Get the current memory used by current process in bytes
+///
+/// This function support Windows and Linux

Review Comment:
   ```suggestion
   /// \brief Get the current memory used by the current process in bytes
   ///
   /// This function supports Windows, Linux, and Mac and will return 0 otherwise
   ```



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -97,10 +97,32 @@
 #include "arrow/util/logging.h"
 
 // For filename conversion
-#if defined(_WIN32)
+#ifdef _WIN32
 #include "arrow/util/utf8.h"
 #endif
 
+#ifdef _WIN32
+#include <psapi.h>
+#include <windows.h>

Review Comment:
   Nit: Can we merge these two ifdef sections?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1867,5 +1889,42 @@ uint64_t GetOptionalThreadId() {
   return (tid == 0) ? tid - 1 : tid;
 }
 
+// Returns the current resident set size (physical memory use) measured
+// in bytes, or zero if the value cannot be determined on this OS.
+// See: https://stackoverflow.com/a/14927379
+int64_t GetCurrentRSS() {
+#if defined(_WIN32)
+  // Windows --------------------------------------------------
+  PROCESS_MEMORY_COUNTERS info;
+  GetProcessMemoryInfo(GetCurrentProcess(), &info, sizeof(info));
+  return (int64_t)info.WorkingSetSize;
+
+#elif defined(__APPLE__) && defined(__MACH__)
+  // OSX ------------------------------------------------------
+  struct mach_task_basic_info info;
+  mach_msg_type_number_t infoCount = MACH_TASK_BASIC_INFO_COUNT;
+  if (task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &infoCount) !=
+      KERN_SUCCESS)
+    return (int64_t)0L;
+  return (int64_t)info.resident_size;
+
+#elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__)
+  // Linux ----------------------------------------------------
+  int64_t rss = 0L;
+  FILE* fp = NULL;
+  if ((fp = fopen("/proc/self/statm", "r")) == NULL) return (int64_t)0L;
+  if (fscanf(fp, "%*s%ld", &rss) != 1) {
+    fclose(fp);

Review Comment:
   `fclose` returns a return code.  If we want to ignore it (might be ok here, not sure in what situations that close would fail and what we would even do about it) then it might be best to wrap in ARROW_UNUSED so it is explicit or add a comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org