You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/22 19:26:41 UTC

[GitHub] [arrow] westonpace commented on issue #9295: Memory Usage increases while Reading the IPC format buffers.

westonpace commented on issue #9295:
URL: https://github.com/apache/arrow/issues/9295#issuecomment-765634730


   Thanks for asking.  There are a number of things to consider when looking at memory allocations by Arrow.  Also, which language are you working with?
   
   Out of the box Arrow will usually use a 3rd party allocator (jemallor or mimalloc).  These allocators can sometimes have unexpected behavior.  For example, they may not relinquish RAM to the OS immediately.  They might hold on to RAM for a while in case they can fulfill an upcoming request with it.  These things make it difficult to tell if RAM usage is accurate or not but there are some things to look for.
   
   Your application should eventually approach a steady state.  If it is running for a long time, it should reach some steady state and stop increasing RAM usage.  If it does not it may be evidence of a leak.
   
   Your application should be able to utilize most of the available RAM.
   
   There is a total allocated bytes counter which you can access from the memory pool (how you do this will depend on the language.  For example, in Python use [this](https://arrow.apache.org/docs/python/generated/pyarrow.total_allocated_bytes.html)).  This counter shows how many bytes are currently in use (which will probably be less than the # of bytes the allocator has "reserved" from the OS).  This will not show any overhead.  So if you make a call, and then release the RAM used by the call, the total allocated bytes should return to where it was previously.  This counter can be used to check for leaks.
   
   So at the moment, a "big spike" is a little vague and it is difficult to tell if it is a problem or not.  How much data are you loading?  Can you provide a sample file or a sample script?  How quickly does it grow and what does it grow to?  Does it get relinquished or reused if your program runs for a long time?  Is the total_allocated_bytes counter also spiking?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org