You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/05 18:08:05 UTC

[GitHub] [arrow] rok opened a new pull request, #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

rok opened a new pull request, #13516:
URL: https://github.com/apache/arrow/pull/13516

   This is to resolve [ARROW-16981](https://issues.apache.org/jira/browse/ARROW-16981).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1253707222

   Thank you @pitrou & @benibus! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r967016471


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);

Review Comment:
   The issue here is that jemalloc's `mallctl` interface takes either `size_t`, `uint64_t` or `uint64_t*` depending on the `name` parameter. E.g.:
   ```
   stats.allocated (size_t) r- [--enable-stats]
   Total number of bytes allocated by the application.
   
   thread.allocated (uint64_t) r- [--enable-stats]
   Get the total number of bytes ever allocated by the calling thread. This counter has the potential to wrap around; it is up to the application to appropriately interpret the counter in such cases.
   
   thread.allocatedp (uint64_t *) r- [--enable-stats]
   Get a pointer to the the value that is returned by the [thread.allocated](http://jemalloc.net/jemalloc.3.html#thread.allocated) mallctl. This is useful for avoiding the overhead of repeated mallctl*() calls. Note that the underlying counter should not be modified by the application.
   ```
   
   If `Result<int64_t> jemalloc_get_stat(const char* name);`-like interface is ok I'd move to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
benibus commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r974332150


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   I should have suggestions up later today.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966342844


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,
+                                     void* newp, size_t newlen);
+
+ARROW_EXPORT Status jemalloc_stats_print(void (*write_cb)(void*, const char*),
+                                         void* cbopaque, const char* opts);

Review Comment:
   I made this only take `opts`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1242747793

   I don't know, why not? Can be a separate JIRA, though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r973570843


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   Actually, @benibus , would you like to help @rok on this? (or even suggest a patch that he can apply :-))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972782721


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   So this should probably be:
   ```c++
   Status jemalloc_stats_print(std::function<void(const char*)> write_cb, const char* opts = "");
   ```
   (or perhaps keep passing `write_cb` as a pointer, which feels less idiomatic but can avoid a move/copy)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972789027


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<uint64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);

Review Comment:
   Can you add a comment explaining what this is for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966344579


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +175,48 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+Status jemalloc_get_stat(const char* name, size_t* out);
+
+/// \brief The mallctl() function provides a general interface for introspecting the
+/// memory allocator, as well as setting modifiable parameters and triggering actions.
+/// The period-separated name argument specifies a location in a tree-structured
+/// namespace; see the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// more information on the tree contents. To read a value, pass a pointer via oldp to
+/// adequate space to contain the value, and a pointer to its length via oldlenp;
+/// otherwise pass NULL and NULL. Similarly, to write a value, pass a pointer to the
+/// value via newp, and its length via newlen; otherwise pass NULL and 0.
+///
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,
+                                     void* newp, size_t newlen);
+
+/// \brief The malloc_stats_print() function writes summary statistics via the write_cb

Review Comment:
   Reduced and referenced the jemalloc docs for flags.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1241882258

   presumably because uint64_t and size_t are actually the same type on those platforms (also, we use raw pointers for out parameters, not references, typically)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r973566367


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   I can't figure out a way to pass `std::function<void(void*, const char*)>*` to `malloc_stats_print` without getting `signal 11: SIGSEGV`.
   As per [jemalloc docs](https://jemalloc.net/jemalloc.3.html): "The malloc_stats_print() function writes summary statistics via the write_cb callback function pointer and cbopaque data passed to write_cb, or malloc_message() if write_cb is NULL."
   It also states: "The malloc_message variable allows the programmer to override the function which emits the text strings forming the errors and warnings if for some reason the STDERR_FILENO file descriptor is not suitable for this. malloc_message() takes the cbopaque pointer argument that is NULL unless overridden by the arguments in a call to malloc_stats_print(), followed by a string pointer. Please note that doing anything which tries to allocate memory in this function is likely to result in a crash or deadlock."
   
   Any idea how to proceed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972900560


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);

Review Comment:
   Changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r973521117


##########
cpp/src/arrow/flight/sql/types.h:
##########
@@ -19,6 +19,7 @@
 
 #include <cstdint>
 #include <iosfwd>
+#include <optional>

Review Comment:
   I had to add this to fix CI failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r974192989


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -19,9 +19,11 @@
 
 #include <atomic>
 #include <cstdint>
+#include <functional>

Review Comment:
   This was fixed upstream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1251739545

   > Here are some potential changes for implementing the callback API with `std::function`, which uses closures instead of a client-provided `void*`. I updated the relevant tests as well.
   
   This looks good to me! Thanks a lot @benibus !
   
   > Hopefully this is the correct method of suggesting patches... the github diffs appear to be against the base branch, not the latest commit.
   
   Well this was pretty nice for me but maybe extra work for you :). I added you as a collaborator to [my fork](https://github.com/rok/arrow) in case you'd want to push more changes to this branch and avoid the UI. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r976372733


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,59 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<int64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  // Update the statistics cached by mallctl.
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, nullptr, 0);
+
+  if (err) {
+    return arrow::internal::IOErrorFromErrno(err, "Failed retrieving ", &name);
+  }
+
+  return value;
+}
+
+Status jemalloc_peak_reset() {
+  int err = mallctl("thread.peak.reset", nullptr, nullptr, nullptr, 0);
+  return err ? arrow::internal::IOErrorFromErrno(err, "Failed resetting thread.peak.")
+             : Status::OK();
+}
+
+Result<std::string> jemalloc_stats_string(const char* opts) {
+  std::string stats;
+  auto write_cb = [&stats](const char* str) { stats.append(str); };
+  ARROW_UNUSED(jemalloc_stats_print(write_cb, opts));
+  return stats;
+}
+
+Status jemalloc_stats_print(const char* opts) {
+  malloc_stats_print(nullptr, nullptr, opts);
+  return Status::OK();
+}
+
+Status jemalloc_stats_print(std::function<void(const char*)> write_cb, const char* opts) {
+  auto cb_wrapper = [](void* opaque, const char* str) {
+    (*static_cast<std::function<void(const char*)>*>(opaque))(str);
+  };
+  if (write_cb) {

Review Comment:
   I think not but not sure. I removed it for now. @benibus do you think this would be needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r963108068


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -151,6 +151,26 @@ Status jemalloc_set_decay_ms(int ms) {
   return Status::OK();
 }
 
+Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp, void* newp,
+                        size_t newlen) {
+#ifdef ARROW_JEMALLOC
+  int res = mallctl(name, oldp, oldlenp, newp, newlen);
+  return res ? Status::UnknownError(res) : Status::OK();

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972780238


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   (of course you must arrange for the `std::function` to be kept alive, however)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972778188


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);

Review Comment:
   I would still favor returning a signed integer here, ergo `int64_t`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1241880768

   @pitrou any idea why is [overloading not allowed](https://github.com/apache/arrow/runs/8258288999?check_suite_focus=true#step:6:1799)? It builds and works ok locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r967021574


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);

Review Comment:
   We're allowed to convert integers :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r967406202


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);

Review Comment:
   I forget sometimes :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r973570799


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   I'll take a look in the next days...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r976141800


##########
cpp/src/arrow/memory_pool_test.cc:
##########
@@ -168,7 +169,103 @@ TEST(Jemalloc, SetDirtyPageDecayMillis) {
 #ifdef ARROW_JEMALLOC
   ASSERT_OK(jemalloc_set_decay_ms(0));
 #else
-  ASSERT_RAISES(Invalid, jemalloc_set_decay_ms(0));
+  ASSERT_RAISES(NotImplemented, jemalloc_set_decay_ms(0));
+#endif
+}
+
+TEST(Jemalloc, GetAllocationStats) {
+#ifdef ARROW_JEMALLOC
+  uint8_t* data;
+  int64_t allocated, active, metadata, resident, mapped, retained, allocated0, active0,
+      metadata0, resident0, mapped0, retained0;
+  int64_t thread_allocated, thread_deallocated, thread_peak_read, thread_allocated0,
+      thread_deallocated0, thread_peak_read0;
+  auto pool = default_memory_pool();

Review Comment:
   Nit, but why not leave it null here? It's initialized on the following line.
   ```suggestion
     MemoryPool* pool = nullptr;
   ```



##########
cpp/src/arrow/memory_pool_test.cc:
##########
@@ -168,7 +169,103 @@ TEST(Jemalloc, SetDirtyPageDecayMillis) {
 #ifdef ARROW_JEMALLOC
   ASSERT_OK(jemalloc_set_decay_ms(0));
 #else
-  ASSERT_RAISES(Invalid, jemalloc_set_decay_ms(0));
+  ASSERT_RAISES(NotImplemented, jemalloc_set_decay_ms(0));
+#endif
+}
+
+TEST(Jemalloc, GetAllocationStats) {
+#ifdef ARROW_JEMALLOC
+  uint8_t* data;
+  int64_t allocated, active, metadata, resident, mapped, retained, allocated0, active0,
+      metadata0, resident0, mapped0, retained0;
+  int64_t thread_allocated, thread_deallocated, thread_peak_read, thread_allocated0,
+      thread_deallocated0, thread_peak_read0;
+  auto pool = default_memory_pool();
+  ABORT_NOT_OK(jemalloc_memory_pool(&pool));
+  ASSERT_EQ("jemalloc", pool->backend_name());
+
+  // Record stats before allocating
+  ASSERT_OK_AND_ASSIGN(allocated0, jemalloc_get_stat("stats.allocated"));
+  ASSERT_OK_AND_ASSIGN(active0, jemalloc_get_stat("stats.active"));
+  ASSERT_OK_AND_ASSIGN(metadata0, jemalloc_get_stat("stats.metadata"));
+  ASSERT_OK_AND_ASSIGN(resident0, jemalloc_get_stat("stats.resident"));
+  ASSERT_OK_AND_ASSIGN(mapped0, jemalloc_get_stat("stats.mapped"));
+  ASSERT_OK_AND_ASSIGN(retained0, jemalloc_get_stat("stats.retained"));
+  ASSERT_OK_AND_ASSIGN(thread_allocated0, jemalloc_get_stat("thread.allocated"));
+  ASSERT_OK_AND_ASSIGN(thread_deallocated0, jemalloc_get_stat("thread.deallocated"));
+  ASSERT_OK_AND_ASSIGN(thread_peak_read0, jemalloc_get_stat("thread.peak.read"));
+
+  // Allocate memory
+  ASSERT_OK(jemalloc_set_decay_ms(10000));

Review Comment:
   Why this? If this test depends on timing it may fail on some slow CI configurations...



##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,59 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC

Review Comment:
   I don't think the ifdef is necessary? This file will only be compiled if jemalloc is enabled.



##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,59 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<int64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  // Update the statistics cached by mallctl.
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, nullptr, 0);
+
+  if (err) {
+    return arrow::internal::IOErrorFromErrno(err, "Failed retrieving ", &name);
+  }
+
+  return value;
+}
+
+Status jemalloc_peak_reset() {
+  int err = mallctl("thread.peak.reset", nullptr, nullptr, nullptr, 0);
+  return err ? arrow::internal::IOErrorFromErrno(err, "Failed resetting thread.peak.")
+             : Status::OK();
+}
+
+Result<std::string> jemalloc_stats_string(const char* opts) {
+  std::string stats;
+  auto write_cb = [&stats](const char* str) { stats.append(str); };
+  ARROW_UNUSED(jemalloc_stats_print(write_cb, opts));
+  return stats;
+}
+
+Status jemalloc_stats_print(const char* opts) {
+  malloc_stats_print(nullptr, nullptr, opts);
+  return Status::OK();
+}
+
+Status jemalloc_stats_print(std::function<void(const char*)> write_cb, const char* opts) {
+  auto cb_wrapper = [](void* opaque, const char* str) {
+    (*static_cast<std::function<void(const char*)>*>(opaque))(str);
+  };
+  if (write_cb) {

Review Comment:
   Is this required? I think the caller should make sure that `write_cb` is initialized.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972775560


##########
cpp/src/arrow/memory_pool.cc:
##########
@@ -665,6 +665,20 @@ MemoryPool* default_memory_pool() {
 Status jemalloc_set_decay_ms(int ms) {
   return Status::Invalid("jemalloc support is not built");
 }
+
+Status jemalloc_get_stat(const char* name, size_t* out) {
+  return Status::Invalid("jemalloc support is not built");

Review Comment:
   TBH, I'd rather switch them all to `NotImplemented`, and reserve `Invalid` for when the parameter given to the function is invalid.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1248535187

   @pitrou could you please take another look?
   The failing tests are just me setting reasonable boundaries for expected allocations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1175348313

   https://issues.apache.org/jira/browse/ARROW-16981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r914104774


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -151,6 +151,26 @@ Status jemalloc_set_decay_ms(int ms) {
   return Status::OK();
 }
 
+Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp, void* newp,
+                        size_t newlen) {
+#ifdef ARROW_JEMALLOC
+  int res = mallctl(name, oldp, oldlenp, newp, newlen);
+  return res ? Status::UnknownError(res) : Status::OK();

Review Comment:
   use std::strerror like jemalloc_set_decay_ms does above?
   
   Also, maybe rewrite set_decay_ms in terms of this function?



##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,

Review Comment:
   nit: docstrings, and group with the jemalloc function above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972790634


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<uint64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, NULLPTR, 0);

Review Comment:
   Nit: `NULLPTR` is only necessary in header files, can use `nullptr` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r973572463


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -19,9 +19,11 @@
 
 #include <atomic>
 #include <cstdint>
+#include <functional>

Review Comment:
   Depending on if we use `function` or not this should be kept or removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r963108380


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,

Review Comment:
   Higher level API does make sense. I suppose we also want to provide some tests with useful examples.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1254308246

   Benchmark runs are scheduled for baseline = bd433c014990d708c084e1f3d166c93116fdbd2f and contender = 8d11e3ddb69db4499707c0aff2e6e35f49f7dfb2. 8d11e3ddb69db4499707c0aff2e6e35f49f7dfb2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a85527bf20c1490d9cb6393918f388ac...4bd7688cb056435a97a6905c2b157a64/)
   [Failed :arrow_down:0.14% :arrow_up:0.1%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/828f7e57674c4a52b65fa20f403fa39e...da08cd15b13946ef82b71fd9beeab822/)
   [Failed :arrow_down:0.28% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e67ced01342b49cbb6018bd4e6ce643f...27ceb82d921d4775b5025b0a481612c3/)
   [Finished :arrow_down:0.78% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/5ea23ca9deb74119b83b440aea3ba70f...b53db38c947b47f3977c2944d74ed041/)
   Buildkite builds:
   [Finished] [`8d11e3dd` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1535)
   [Failed] [`8d11e3dd` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1552)
   [Failed] [`8d11e3dd` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1535)
   [Finished] [`8d11e3dd` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1549)
   [Finished] [`bd433c01` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1534)
   [Finished] [`bd433c01` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1551)
   [Failed] [`bd433c01` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1534)
   [Finished] [`bd433c01` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1548)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r963612018


##########
cpp/src/arrow/memory_pool.cc:
##########
@@ -665,6 +665,20 @@ MemoryPool* default_memory_pool() {
 Status jemalloc_set_decay_ms(int ms) {
   return Status::Invalid("jemalloc support is not built");
 }
+
+Status jemalloc_get_stat(const char* name, size_t* out) {
+  return Status::Invalid("jemalloc support is not built");

Review Comment:
   nit: why Invalid here and NotImplemented below?



##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +175,48 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+Status jemalloc_get_stat(const char* name, size_t* out);
+
+/// \brief The mallctl() function provides a general interface for introspecting the
+/// memory allocator, as well as setting modifiable parameters and triggering actions.
+/// The period-separated name argument specifies a location in a tree-structured
+/// namespace; see the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// more information on the tree contents. To read a value, pass a pointer via oldp to
+/// adequate space to contain the value, and a pointer to its length via oldlenp;
+/// otherwise pass NULL and NULL. Similarly, to write a value, pass a pointer to the
+/// value via newp, and its length via newlen; otherwise pass NULL and 0.
+///
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,
+                                     void* newp, size_t newlen);
+
+/// \brief The malloc_stats_print() function writes summary statistics via the write_cb

Review Comment:
   Maybe we shouldn't directly copy from their documentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r976373194


##########
cpp/src/arrow/memory_pool_test.cc:
##########
@@ -168,7 +169,103 @@ TEST(Jemalloc, SetDirtyPageDecayMillis) {
 #ifdef ARROW_JEMALLOC
   ASSERT_OK(jemalloc_set_decay_ms(0));
 #else
-  ASSERT_RAISES(Invalid, jemalloc_set_decay_ms(0));
+  ASSERT_RAISES(NotImplemented, jemalloc_set_decay_ms(0));
+#endif
+}
+
+TEST(Jemalloc, GetAllocationStats) {
+#ifdef ARROW_JEMALLOC
+  uint8_t* data;
+  int64_t allocated, active, metadata, resident, mapped, retained, allocated0, active0,
+      metadata0, resident0, mapped0, retained0;
+  int64_t thread_allocated, thread_deallocated, thread_peak_read, thread_allocated0,
+      thread_deallocated0, thread_peak_read0;
+  auto pool = default_memory_pool();
+  ABORT_NOT_OK(jemalloc_memory_pool(&pool));
+  ASSERT_EQ("jemalloc", pool->backend_name());
+
+  // Record stats before allocating
+  ASSERT_OK_AND_ASSIGN(allocated0, jemalloc_get_stat("stats.allocated"));
+  ASSERT_OK_AND_ASSIGN(active0, jemalloc_get_stat("stats.active"));
+  ASSERT_OK_AND_ASSIGN(metadata0, jemalloc_get_stat("stats.metadata"));
+  ASSERT_OK_AND_ASSIGN(resident0, jemalloc_get_stat("stats.resident"));
+  ASSERT_OK_AND_ASSIGN(mapped0, jemalloc_get_stat("stats.mapped"));
+  ASSERT_OK_AND_ASSIGN(retained0, jemalloc_get_stat("stats.retained"));
+  ASSERT_OK_AND_ASSIGN(thread_allocated0, jemalloc_get_stat("thread.allocated"));
+  ASSERT_OK_AND_ASSIGN(thread_deallocated0, jemalloc_get_stat("thread.deallocated"));
+  ASSERT_OK_AND_ASSIGN(thread_peak_read0, jemalloc_get_stat("thread.peak.read"));
+
+  // Allocate memory
+  ASSERT_OK(jemalloc_set_decay_ms(10000));

Review Comment:
   This was a leftover from something I was trying earlier. Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966987541


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);
+
+/// \brief Get basic allocation statistics from jemalloc by passing a pointer
+/// This is useful for avoiding the overhead of repeated copy calls.
+Status jemalloc_get_statp(const char* name, uint64_t& out);
+
+/// \brief Resets the counter for bytes allocated in the calling thread to zero.

Review Comment:
   ```suggestion
   /// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1244007751

   Added [ARROW-17685](https://issues.apache.org/jira/browse/ARROW-17685) for the Python wrapper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966343675


##########
cpp/src/arrow/memory_pool.cc:
##########
@@ -665,6 +665,20 @@ MemoryPool* default_memory_pool() {
 Status jemalloc_set_decay_ms(int ms) {
   return Status::Invalid("jemalloc support is not built");
 }
+
+Status jemalloc_get_stat(const char* name, size_t* out) {
+  return Status::Invalid("jemalloc support is not built");

Review Comment:
   Good point, switching all to `Invalid`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1249093991

   > The failing tests are just me setting reasonable boundaries for expected allocations.
   
   Well, we should get them to pass in any case, so perhaps relax your reasonable expectations? :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972904264


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<uint64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);

Review Comment:
   That looks fine to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
benibus commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r973607559


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   Yep! I'll take a look at it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972779545


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<uint64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(std::function<void(void*, const char*)>* write_cb,
+                            void* cbopaque, const char* opts = "");

Review Comment:
   There is no need to pass a `void*` as C++ conveniently allows to pass a closure (or any callable object) as a `std::function`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1242544651

   Do we want Python binding for these as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972786764


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<uint64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, NULLPTR, 0);
+
+  if (err) {
+    return arrow::internal::IOErrorFromErrno(err, "Failed retrieving ", &name);
+  }
+
+  return value;
+}
+
+Status jemalloc_peak_reset() {
+  int err = mallctl("thread.peak.reset", NULLPTR, NULLPTR, NULLPTR, 0);
+  return err ? arrow::internal::IOErrorFromErrno(err, "Failed resetting thread.peak.")
+             : Status::OK();
+}
+
+Result<std::string> jemalloc_stats_print(const char* opts) {
+  std::string stats;
+  auto write_cb = [](void* opaque, const char* str) {
+    reinterpret_cast<std::string*>(opaque)->append(str);
+  };
+  malloc_stats_print(write_cb, &stats, opts);
+  return stats;
+}
+
+Status jemalloc_stats_print(std::function<void(void* opaque, const char* buf)>* write_cb,
+                            void* cbopaque, const char* opts) {
+  malloc_stats_print(reinterpret_cast<void (*)(void*, const char*)>(write_cb), cbopaque,

Review Comment:
   ?? You cannot blindly cast a `std::function` to a plain function pointer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r972901436


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<uint64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);

Review Comment:
   Added: `// Update the statistics cached by mallctl.`. Do you think that's enough?



##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<uint64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, NULLPTR, 0);

Review Comment:
   Changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r974193513


##########
cpp/src/arrow/flight/sql/types.h:
##########
@@ -19,6 +19,7 @@
 
 #include <cstdint>
 #include <iosfwd>
+#include <optional>

Review Comment:
   This was fixed upstream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r976371283


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,59 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC

Review Comment:
   Indeed! Removing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966987989


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);
+
+/// \brief Get basic allocation statistics from jemalloc by passing a pointer
+/// This is useful for avoiding the overhead of repeated copy calls.
+Status jemalloc_get_statp(const char* name, uint64_t& out);
+
+/// \brief Resets the counter for bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+Status jemalloc_peak_reset();
+
+/// \brief Prints summary statistics in human-readable form. See malloc_stats_print

Review Comment:
   Let's keep the brief form brief. Also, where does it print to? Can it be changed?
   ```suggestion
   /// \brief Print summary statistics in human-readable form.
   ///
   /// See malloc_stats_print
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r967018539


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);
+
+/// \brief Get basic allocation statistics from jemalloc by passing a pointer
+/// This is useful for avoiding the overhead of repeated copy calls.
+Status jemalloc_get_statp(const char* name, uint64_t& out);
+
+/// \brief Resets the counter for bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+Status jemalloc_peak_reset();
+
+/// \brief Prints summary statistics in human-readable form. See malloc_stats_print

Review Comment:
   This prints to stdout and while `malloc_stats_print` gives other options we probably don't need them at the moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1253687406

   Thanks a lot @rok and @benibus !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou merged PR #13516:
URL: https://github.com/apache/arrow/pull/13516


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1175348326

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r917935181


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -151,6 +151,26 @@ Status jemalloc_set_decay_ms(int ms) {
   return Status::OK();
 }
 
+Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp, void* newp,
+                        size_t newlen) {
+#ifdef ARROW_JEMALLOC
+  int res = mallctl(name, oldp, oldlenp, newp, newlen);
+  return res ? Status::UnknownError(res) : Status::OK();

Review Comment:
   Better, use `IOErrorFromErrno`.



##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,

Review Comment:
   For example:
   ```c++
   Status jemalloc_get_stat(const char* name, bool* out);
   Status jemalloc_get_stat(const char* name, int64_t* out);
   Status jemalloc_get_stat(const char* name, uint64_t* out);
   Status jemalloc_get_stat(const char* name, const char** out);
   ```
   (passing the result as out-parameter to allow for several signatures; we could also return a variant but that would add a heavy header inclusion)



##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,

Review Comment:
   I'm not sure, but I would favour exposing higher-level APIs rather than this raw C ugliness.



##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,
+                                     void* newp, size_t newlen);
+
+ARROW_EXPORT Status jemalloc_stats_print(void (*write_cb)(void*, const char*),
+                                         void* cbopaque, const char* opts);

Review Comment:
   Wow, can this take a `std::function` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966342012


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -180,6 +180,12 @@ Status jemalloc_set_decay_ms(int ms);
 /// May return NotImplemented if mimalloc is not available.
 ARROW_EXPORT Status mimalloc_memory_pool(MemoryPool** out);
 
+ARROW_EXPORT Status jemalloc_mallctl(const char* name, void* oldp, size_t* oldlenp,

Review Comment:
   Changed. I've gone with a subset as I don't think we'll need `int64_t` and `int64_t` for now.
   Should these be `Status` or `Result`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r974811340


##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<int64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  // Update the statistics cached by mallctl.
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, nullptr, 0);
+
+  if (err) {
+    return arrow::internal::IOErrorFromErrno(err, "Failed retrieving ", &name);
+  }
+
+  return value;
+}
+
+Status jemalloc_peak_reset() {
+  int err = mallctl("thread.peak.reset", nullptr, nullptr, nullptr, 0);
+  return err ? arrow::internal::IOErrorFromErrno(err, "Failed resetting thread.peak.")
+             : Status::OK();
+}
+
+Result<std::string> jemalloc_stats_print(const char* opts) {
+  std::string stats;
+  auto write_cb = [](void* opaque, const char* str) {
+    reinterpret_cast<std::string*>(opaque)->append(str);
+  };
+  malloc_stats_print(write_cb, &stats, opts);
+  return stats;
+}
+
+Status jemalloc_stats_print(void (*write_cb)(void*, const char*), void* cbopaque,
+                            const char* opts) {
+  malloc_stats_print(write_cb, cbopaque, opts);
+  return Status::OK();
+}

Review Comment:
   Oh, got it. Good to know, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
benibus commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r974552546


##########
cpp/src/arrow/memory_pool.cc:
##########
@@ -667,8 +667,26 @@ MemoryPool* default_memory_pool() {
 
 #ifndef ARROW_JEMALLOC
 Status jemalloc_set_decay_ms(int ms) {
-  return Status::Invalid("jemalloc support is not built");
+  return Status::NotImplemented("jemalloc support is not built");
 }
+
+Result<int64_t> jemalloc_get_stat(const char* name) {
+  return Status::NotImplemented("jemalloc support is not built");
+}
+
+Status jemalloc_peak_reset() {
+  return Status::NotImplemented("jemalloc support is not built");
+}
+
+Status jemalloc_stats_print(void (*write_cb)(void*, const char*), void* cbopaque,
+                            const char* opts) {
+  return Status::NotImplemented("jemalloc support is not built");
+}
+
+Result<std::string> jemalloc_stats_print(const char* opts) {
+  return Status::NotImplemented("jemalloc support is not built");
+}

Review Comment:
   ```suggestion
   Status jemalloc_stats_print(const char* opts) {
     return Status::NotImplemented("jemalloc support is not built");
   }
   
   Status jemalloc_stats_print(std::function<void(const char*)> write_cb,
                               const char* opts) {
     return Status::NotImplemented("jemalloc support is not built");
   }
   
   Result<std::string> jemalloc_stats_string(const char* opts) {
     return Status::NotImplemented("jemalloc support is not built");
   }
   ```



##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +177,31 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic statistics from jemalloc's mallctl.
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+ARROW_EXPORT
+Result<int64_t> jemalloc_get_stat(const char* name);
+
+/// \brief Reset the counter for peak bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+ARROW_EXPORT
+Status jemalloc_peak_reset();
+
+/// \brief Print summary statistics in human-readable form to stderr.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Status jemalloc_stats_print(void (*write_cb)(void*, const char*), void* cbopaque,
+                            const char* opts = "");
+
+/// \brief Get summary statistics in human-readable form.
+/// See malloc_stats_print documentation in jemalloc project documentation for
+/// available opt flags.
+ARROW_EXPORT
+Result<std::string> jemalloc_stats_print(const char* opts = "");

Review Comment:
   ```suggestion
   /// \brief Print summary statistics in human-readable form to stderr.
   /// See malloc_stats_print documentation in jemalloc project documentation for
   /// available opt flags.
   ARROW_EXPORT
   Status jemalloc_stats_print(const char* opts = "");
   
   /// \brief Print summary statistics in human-readable form using a callback
   /// See malloc_stats_print documentation in jemalloc project documentation for
   /// available opt flags.
   ARROW_EXPORT
   Status jemalloc_stats_print(std::function<void(const char*)> write_cb,
                               const char* opts = "");
   
   /// \brief Get summary statistics in human-readable form.
   /// See malloc_stats_print documentation in jemalloc project documentation for
   /// available opt flags.
   ARROW_EXPORT
   Result<std::string> jemalloc_stats_string(const char* opts = "");
   ```
   Only really introduced `jemalloc_stats_string` to differentiate it from the [proposed] default version of `jemalloc_stats_print`.



##########
cpp/src/arrow/memory_pool_jemalloc.cc:
##########
@@ -153,4 +154,52 @@ Status jemalloc_set_decay_ms(int ms) {
 
 #undef RETURN_IF_JEMALLOC_ERROR
 
+#ifdef ARROW_JEMALLOC
+Result<int64_t> jemalloc_get_stat(const char* name) {
+  size_t sz = sizeof(uint64_t);
+  int err;
+  uint64_t value;
+
+  // Update the statistics cached by mallctl.
+  if (std::strcmp(name, "stats.allocated") == 0 ||
+      std::strcmp(name, "stats.active") == 0 ||
+      std::strcmp(name, "stats.metadata") == 0 ||
+      std::strcmp(name, "stats.resident") == 0 ||
+      std::strcmp(name, "stats.mapped") == 0 ||
+      std::strcmp(name, "stats.retained") == 0) {
+    uint64_t epoch;
+    mallctl("epoch", &epoch, &sz, &epoch, sz);
+  }
+
+  err = mallctl(name, &value, &sz, nullptr, 0);
+
+  if (err) {
+    return arrow::internal::IOErrorFromErrno(err, "Failed retrieving ", &name);
+  }
+
+  return value;
+}
+
+Status jemalloc_peak_reset() {
+  int err = mallctl("thread.peak.reset", nullptr, nullptr, nullptr, 0);
+  return err ? arrow::internal::IOErrorFromErrno(err, "Failed resetting thread.peak.")
+             : Status::OK();
+}
+
+Result<std::string> jemalloc_stats_print(const char* opts) {
+  std::string stats;
+  auto write_cb = [](void* opaque, const char* str) {
+    reinterpret_cast<std::string*>(opaque)->append(str);
+  };
+  malloc_stats_print(write_cb, &stats, opts);
+  return stats;
+}
+
+Status jemalloc_stats_print(void (*write_cb)(void*, const char*), void* cbopaque,
+                            const char* opts) {
+  malloc_stats_print(write_cb, cbopaque, opts);
+  return Status::OK();
+}

Review Comment:
   ```suggestion
   Result<std::string> jemalloc_stats_string(const char* opts) {
     std::string stats;
     auto write_cb = [&stats](const char* str) { stats.append(str); };
     ARROW_UNUSED(jemalloc_stats_print(write_cb, opts));
     return stats;
   }
   
   Status jemalloc_stats_print(const char* opts) {
     malloc_stats_print(nullptr, nullptr, opts);
     return Status::OK();
   }
   
   Status jemalloc_stats_print(std::function<void(const char*)> write_cb,
                               const char* opts) {
     auto cb_wrapper = [](void* opaque, const char* str) {
       (*static_cast<std::function<void(const char*)>*>(opaque))(str);
     };
     if (write_cb) {
       malloc_stats_print(cb_wrapper, &write_cb, opts);
     }
     return Status::OK();
   }
   ```
   You can't pass a `std::function` as a function pointer since it's actually a class that holds a "callable object" (which may or may not be an actual function pointer). Instead, you can provide a wrapper callback that invokes it manually.



##########
cpp/src/arrow/memory_pool_test.cc:
##########
@@ -168,7 +169,107 @@ TEST(Jemalloc, SetDirtyPageDecayMillis) {
 #ifdef ARROW_JEMALLOC
   ASSERT_OK(jemalloc_set_decay_ms(0));
 #else
-  ASSERT_RAISES(Invalid, jemalloc_set_decay_ms(0));
+  ASSERT_RAISES(NotImplemented, jemalloc_set_decay_ms(0));
+#endif
+}
+
+TEST(Jemalloc, GetAllocationStats) {
+#ifdef ARROW_JEMALLOC
+  uint8_t* data;
+  int64_t allocated, active, metadata, resident, mapped, retained, allocated0, active0,
+      metadata0, resident0, mapped0, retained0;
+  int64_t thread_allocated, thread_deallocated, thread_peak_read, thread_allocated0,
+      thread_deallocated0, thread_peak_read0;
+  auto pool = default_memory_pool();
+  ABORT_NOT_OK(jemalloc_memory_pool(&pool));
+  ASSERT_EQ("jemalloc", pool->backend_name());
+
+  // Record stats before allocating
+  ASSERT_OK_AND_ASSIGN(allocated0, jemalloc_get_stat("stats.allocated"));
+  ASSERT_OK_AND_ASSIGN(active0, jemalloc_get_stat("stats.active"));
+  ASSERT_OK_AND_ASSIGN(metadata0, jemalloc_get_stat("stats.metadata"));
+  ASSERT_OK_AND_ASSIGN(resident0, jemalloc_get_stat("stats.resident"));
+  ASSERT_OK_AND_ASSIGN(mapped0, jemalloc_get_stat("stats.mapped"));
+  ASSERT_OK_AND_ASSIGN(retained0, jemalloc_get_stat("stats.retained"));
+  ASSERT_OK_AND_ASSIGN(thread_allocated0, jemalloc_get_stat("thread.allocated"));
+  ASSERT_OK_AND_ASSIGN(thread_deallocated0, jemalloc_get_stat("thread.deallocated"));
+  ASSERT_OK_AND_ASSIGN(thread_peak_read0, jemalloc_get_stat("thread.peak.read"));
+
+  // Allocate memory
+  ASSERT_OK(jemalloc_set_decay_ms(10000));
+  ASSERT_OK(pool->Allocate(1025, &data));
+  ASSERT_EQ(pool->bytes_allocated(), 1025);
+  ASSERT_OK(pool->Reallocate(1025, 1023, &data));
+  ASSERT_EQ(pool->bytes_allocated(), 1023);
+
+  // Record stats after allocating
+  ASSERT_OK_AND_ASSIGN(allocated, jemalloc_get_stat("stats.allocated"));
+  ASSERT_OK_AND_ASSIGN(active, jemalloc_get_stat("stats.active"));
+  ASSERT_OK_AND_ASSIGN(metadata, jemalloc_get_stat("stats.metadata"));
+  ASSERT_OK_AND_ASSIGN(resident, jemalloc_get_stat("stats.resident"));
+  ASSERT_OK_AND_ASSIGN(mapped, jemalloc_get_stat("stats.mapped"));
+  ASSERT_OK_AND_ASSIGN(retained, jemalloc_get_stat("stats.retained"));
+  ASSERT_OK_AND_ASSIGN(thread_allocated, jemalloc_get_stat("thread.allocated"));
+  ASSERT_OK_AND_ASSIGN(thread_deallocated, jemalloc_get_stat("thread.deallocated"));
+  ASSERT_OK_AND_ASSIGN(thread_peak_read, jemalloc_get_stat("thread.peak.read"));
+
+  // Check allocated stats pre-allocation
+  ASSERT_NEAR(allocated0, 120000, 100000);
+  ASSERT_NEAR(active0, 75000, 70000);
+  ASSERT_NEAR(metadata0, 3000000, 1000000);
+  ASSERT_NEAR(resident0, 3000000, 1000000);
+  ASSERT_NEAR(mapped0, 6500000, 1000000);
+  ASSERT_NEAR(retained0, 1500000, 2000000);
+
+  // Check allocated stats change due to allocation
+  ASSERT_NEAR(allocated - allocated0, 70000, 50000);
+  ASSERT_NEAR(active - active0, 100000, 90000);
+  ASSERT_NEAR(metadata - metadata0, 500, 460);
+  ASSERT_NEAR(resident - resident0, 120000, 110000);
+  ASSERT_NEAR(mapped - mapped0, 100000, 90000);
+  ASSERT_NEAR(retained - retained0, 0, 40000);
+
+  ASSERT_NEAR(thread_peak_read - thread_peak_read0, 1024, 700);
+  ASSERT_NEAR(thread_allocated - thread_allocated0, 2500, 500);
+  ASSERT_EQ(thread_deallocated - thread_deallocated0, 1280);
+
+  // Resetting thread peak read metric
+  ASSERT_OK(pool->Allocate(12560, &data));
+  ASSERT_OK_AND_ASSIGN(thread_peak_read, jemalloc_get_stat("thread.peak.read"));
+  ASSERT_NEAR(thread_peak_read, 15616, 1000);
+  ASSERT_OK(jemalloc_peak_reset());
+  ASSERT_OK(pool->Allocate(1256, &data));
+  ASSERT_OK_AND_ASSIGN(thread_peak_read, jemalloc_get_stat("thread.peak.read"));
+  ASSERT_NEAR(thread_peak_read, 1280, 100);
+
+  // Print statistics to stderr
+  ASSERT_OK(jemalloc_stats_print(nullptr, nullptr, "J"));
+
+  // Read statistics into std::string
+  ASSERT_OK_AND_ASSIGN(std::string stats, jemalloc_stats_print("Jax"));
+
+  // Read statistics into std::string with a lambda
+  std::string stats2;
+  auto write_cb = [](void* opaque, const char* str) {
+    reinterpret_cast<std::string*>(opaque)->append(str);
+  };
+  ASSERT_OK(jemalloc_stats_print(write_cb, &stats2, "Jax"));
+
+  ASSERT_EQ(stats.rfind("{\"jemalloc\":{\"version\"", 0), 0);
+  ASSERT_EQ(stats2.rfind("{\"jemalloc\":{\"version\"", 0), 0);
+  ASSERT_EQ(stats.substr(0, 100), stats2.substr(0, 100));
+#else
+  std::string stats;
+  auto write_cb = [](void* opaque, const char* str) {
+    reinterpret_cast<std::string*>(opaque)->append(str);
+  };
+  ASSERT_RAISES(NotImplemented, jemalloc_get_stat("thread.peak.read"));
+  ASSERT_RAISES(NotImplemented, jemalloc_get_stat("stats.allocated"));
+  ASSERT_RAISES(NotImplemented, jemalloc_get_stat("stats.allocated"));
+  ASSERT_RAISES(NotImplemented, jemalloc_get_stat("stats.allocatedp"));
+  ASSERT_RAISES(NotImplemented, jemalloc_peak_reset());
+  ASSERT_RAISES(NotImplemented, jemalloc_stats_print(write_cb, &stats, "Jax"));

Review Comment:
   ```suggestion
     ASSERT_OK(jemalloc_stats_print("J"));
   
     // Read statistics into std::string
     ASSERT_OK_AND_ASSIGN(std::string stats, jemalloc_stats_string("Jax"));
   
     // Read statistics into std::string with a lambda
     std::string stats2;
     auto write_cb = [&stats2](const char* str) { stats2.append(str); };
     ASSERT_OK(jemalloc_stats_print(write_cb, "Jax"));
   
     ASSERT_EQ(stats.rfind("{\"jemalloc\":{\"version\"", 0), 0);
     ASSERT_EQ(stats2.rfind("{\"jemalloc\":{\"version\"", 0), 0);
     ASSERT_EQ(stats.substr(0, 100), stats2.substr(0, 100));
   #else
     std::string stats;
     auto write_cb = [&stats](const char* str) { stats.append(str); };
     ASSERT_RAISES(NotImplemented, jemalloc_get_stat("thread.peak.read"));
     ASSERT_RAISES(NotImplemented, jemalloc_get_stat("stats.allocated"));
     ASSERT_RAISES(NotImplemented, jemalloc_get_stat("stats.allocated"));
     ASSERT_RAISES(NotImplemented, jemalloc_get_stat("stats.allocatedp"));
     ASSERT_RAISES(NotImplemented, jemalloc_peak_reset());
     ASSERT_RAISES(NotImplemented, jemalloc_stats_print(write_cb, "Jax"));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13516:
URL: https://github.com/apache/arrow/pull/13516#issuecomment-1253551361

   Thanks for the review @pitrou! I've addressed your points/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r967023062


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);
+
+/// \brief Get basic allocation statistics from jemalloc by passing a pointer
+/// This is useful for avoiding the overhead of repeated copy calls.
+Status jemalloc_get_statp(const char* name, uint64_t& out);
+
+/// \brief Resets the counter for bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+Status jemalloc_peak_reset();
+
+/// \brief Prints summary statistics in human-readable form. See malloc_stats_print

Review Comment:
   Apparently it would print to stderr rather than stdout?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r966987312


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);

Review Comment:
   As per the coding conventions, we avoid non-const references, so the output parameter should be a pointer.
   Also IMHO it doesn't make sense to expose different integer sizes, or to default to unsigned. So I would make it:
   ```c++
   Status jemalloc_get_stat(const char* name, int64_t* out);
   ```
   or perhaps even (but it's less flexible if there ever are non-integer statistics:
   ```c++
   Result<int64_t> jemalloc_get_stat(const char* name);
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13516: ARROW-16981: [C++] Expose jemalloc statistics for logging

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13516:
URL: https://github.com/apache/arrow/pull/13516#discussion_r967405971


##########
cpp/src/arrow/memory_pool.h:
##########
@@ -175,6 +176,25 @@ ARROW_EXPORT Status jemalloc_memory_pool(MemoryPool** out);
 ARROW_EXPORT
 Status jemalloc_set_decay_ms(int ms);
 
+/// \brief Get basic allocation statistics from jemalloc
+/// See the MALLCTL NAMESPACE section in jemalloc project documentation for
+/// available stats.
+Status jemalloc_get_stat(const char* name, size_t& out);
+Status jemalloc_get_stat(const char* name, uint64_t& out);
+
+/// \brief Get basic allocation statistics from jemalloc by passing a pointer
+/// This is useful for avoiding the overhead of repeated copy calls.
+Status jemalloc_get_statp(const char* name, uint64_t& out);
+
+/// \brief Resets the counter for bytes allocated in the calling thread to zero.
+/// This affects subsequent calls to thread.peak.read, but not the values returned by
+/// thread.allocated or thread.deallocated.
+Status jemalloc_peak_reset();
+
+/// \brief Prints summary statistics in human-readable form. See malloc_stats_print

Review Comment:
   Ah yes, it's stderr indeed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org