You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/28 12:43:56 UTC

[GitHub] [arrow] westonpace opened a new pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

westonpace opened a new pull request #10186:
URL: https://github.com/apache/arrow/pull/10186


   @pitrou What are your thoughts on this.  It's not urgent (benchmarking can always use the system allocator in the meantime if they are having RAM issues and I think they reduced already which benchmarks they run on the RAM-limited servers)
   
   With mimalloc the following command has a max RSS of ~16GB.
   ```
   conbench file-read fanniemae_2016Q4 --all=true
   ```
   
   If I add a call to release_unused between runs then the max RSS drops to ~10GB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r622885877



##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -253,6 +253,8 @@ class SystemAllocator {
 #endif
     }
   }
+
+  static void ReleaseUnused() {}

Review comment:
       Note the glibc has [malloc_trim](https://man7.org/linux/man-pages/man3/malloc_trim.3.html). This can probably be used depending if the `__GLIBC__` macro is present.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r622155631



##########
File path: cpp/src/arrow/memory_pool.h
##########
@@ -87,6 +87,11 @@ class ARROW_EXPORT MemoryPool {
   ///   faster deallocation if supported by its backend.
   virtual void Free(uint8_t* buffer, int64_t size) = 0;
 
+  /// Returns unused memory to the OS

Review comment:
       "Return"

##########
File path: python/pyarrow/memory.pxi
##########
@@ -36,6 +36,15 @@ cdef class MemoryPool(_Weakrefable):
     cdef void init(self, CMemoryPool* pool):
         self.pool = pool
 
+    def release_unused(self):
+        """
+        Attempts to free any memory being held onto by an optimized allocator.
+        This function should not be called except potentially for benchmarking
+        or debugging as it could be expensive and detrimental to performance.
+        """
+        cdef CMemoryPool* pool = c_get_memory_pool()
+        pool.ReleaseUnused()

Review comment:
       I think we can enclose this in `with nogil` as releasing memory might be an expensive operation?

##########
File path: cpp/src/arrow/memory_pool.h
##########
@@ -87,6 +87,11 @@ class ARROW_EXPORT MemoryPool {
   ///   faster deallocation if supported by its backend.
   virtual void Free(uint8_t* buffer, int64_t size) = 0;
 
+  /// Returns unused memory to the OS
+  ///
+  /// Only applies to allocators that hold onto unused memory

Review comment:
       Mention that this is a best effort only? Even an allocator which exposes such a facility may not release all "unused" memory (for example because of memory fragmentation).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r624104610



##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -300,6 +302,8 @@ class JemallocAllocator {
       dallocx(ptr, MALLOCX_ALIGN(kAlignment));
     }
   }
+
+  static void ReleaseUnused() {}

Review comment:
       I added a jemalloc implementation.  It does not appear to do anything in my manual testing but it is possible that jemalloc was simply just already releases pages to the OS.  Since this is best-effort anyways I think it is ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r622890225



##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -300,6 +302,8 @@ class JemallocAllocator {
       dallocx(ptr, MALLOCX_ALIGN(kAlignment));
     }
   }
+
+  static void ReleaseUnused() {}

Review comment:
       Apparently, we can ask jemalloc to explicitly purge unused pages: https://github.com/jemalloc/jemalloc/issues/1190




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r629475799



##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -35,6 +35,10 @@
 #include "arrow/util/optional.h"
 #include "arrow/util/string.h"
 
+#ifdef __GLIBCXX__

Review comment:
       Looks like you should test for `__GLIBC__` instead. `__GLIBCXX__` is for the GNU libstdc++ (which is used by MinGW).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r624106186



##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -253,6 +253,8 @@ class SystemAllocator {
 #endif
     }
   }
+
+  static void ReleaseUnused() {}

Review comment:
       I added a `malloc_trim` call.  My manual exploration confirmed this triggers a number of `madvise(...MADV_DONTNEED)` calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#issuecomment-836872587


   @westonpace Can you rebase and also fix the Python lint issue?
   
   Edit: done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#issuecomment-836872587


   @westonpace Can you rebase and also fix the Python lint issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#issuecomment-828424445


   https://issues.apache.org/jira/browse/ARROW-12584


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10186:
URL: https://github.com/apache/arrow/pull/10186


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r629475799



##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -35,6 +35,10 @@
 #include "arrow/util/optional.h"
 #include "arrow/util/string.h"
 
+#ifdef __GLIBCXX__

Review comment:
       Looks like you should include `<features.h>` and test for `__GLIBC__` instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10186: ARROW-12584: [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10186:
URL: https://github.com/apache/arrow/pull/10186#discussion_r624106373



##########
File path: python/pyarrow/memory.pxi
##########
@@ -36,6 +36,15 @@ cdef class MemoryPool(_Weakrefable):
     cdef void init(self, CMemoryPool* pool):
         self.pool = pool
 
+    def release_unused(self):
+        """
+        Attempts to free any memory being held onto by an optimized allocator.
+        This function should not be called except potentially for benchmarking
+        or debugging as it could be expensive and detrimental to performance.
+        """
+        cdef CMemoryPool* pool = c_get_memory_pool()
+        pool.ReleaseUnused()

Review comment:
       Done.

##########
File path: cpp/src/arrow/memory_pool.h
##########
@@ -87,6 +87,11 @@ class ARROW_EXPORT MemoryPool {
   ///   faster deallocation if supported by its backend.
   virtual void Free(uint8_t* buffer, int64_t size) = 0;
 
+  /// Returns unused memory to the OS
+  ///
+  /// Only applies to allocators that hold onto unused memory

Review comment:
       Done.

##########
File path: cpp/src/arrow/memory_pool.h
##########
@@ -87,6 +87,11 @@ class ARROW_EXPORT MemoryPool {
   ///   faster deallocation if supported by its backend.
   virtual void Free(uint8_t* buffer, int64_t size) = 0;
 
+  /// Returns unused memory to the OS

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org