You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/22 18:45:37 UTC

[GitHub] [arrow] kummishra opened a new issue #9295: Memory Usage increases while Reading the IPC format buffers.

kummishra opened a new issue #9295:
URL: https://github.com/apache/arrow/issues/9295


   hello,
   
   we are noticing this issue when having our data placed in arrow ipc format , which does read very fast , almost no time. but we see a strange behaviour while reading the data values from the record batches. the memory usage increases after just accessing slots in record batches from ipc format.
   
   having several record batches total application memory shows a big spike. is this expected , is the  accessed memory retained  physically .
   
   any suggestions here ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on issue #9295: Memory Usage increases while Reading the IPC format buffers.

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #9295:
URL: https://github.com/apache/arrow/issues/9295#issuecomment-765634730


   Thanks for asking.  There are a number of things to consider when looking at memory allocations by Arrow.  Also, which language are you working with?
   
   Out of the box Arrow will usually use a 3rd party allocator (jemallor or mimalloc).  These allocators can sometimes have unexpected behavior.  For example, they may not relinquish RAM to the OS immediately.  They might hold on to RAM for a while in case they can fulfill an upcoming request with it.  These things make it difficult to tell if RAM usage is accurate or not but there are some things to look for.
   
   Your application should eventually approach a steady state.  If it is running for a long time, it should reach some steady state and stop increasing RAM usage.  If it does not it may be evidence of a leak.
   
   Your application should be able to utilize most of the available RAM.
   
   There is a total allocated bytes counter which you can access from the memory pool (how you do this will depend on the language.  For example, in Python use [this](https://arrow.apache.org/docs/python/generated/pyarrow.total_allocated_bytes.html)).  This counter shows how many bytes are currently in use (which will probably be less than the # of bytes the allocator has "reserved" from the OS).  This will not show any overhead.  So if you make a call, and then release the RAM used by the call, the total allocated bytes should return to where it was previously.  This counter can be used to check for leaks.
   
   So at the moment, a "big spike" is a little vague and it is difficult to tell if it is a problem or not.  How much data are you loading?  Can you provide a sample file or a sample script?  How quickly does it grow and what does it grow to?  Does it get relinquished or reused if your program runs for a long time?  Is the total_allocated_bytes counter also spiking?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #9295: Memory Usage increases while Reading the IPC format buffers.

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #9295:
URL: https://github.com/apache/arrow/issues/9295#issuecomment-769953003


   Let us discuss further either on Jira or the mailing list. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed issue #9295: Memory Usage increases while Reading the IPC format buffers.

Posted by GitBox <gi...@apache.org>.
wesm closed issue #9295:
URL: https://github.com/apache/arrow/issues/9295


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org