You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Hrishi Dharam <Hr...@twosigma.com> on 2022/10/31 16:55:18 UTC

Buffer close - when is underlying memory freed?

Hi Arrow community,

My team operates an arrow Flight service implemented in java and we also do additional allocations in the service via Arrow's BufferAllocator API .
We've been recently struggling with OOM issues for both heap and off-heap memory.
One potential issue we've noticed from a heap dump is that we have java DirectByteBuffer objects that are eligible for GC that hang onto a significant volume of underlying direct memory buffers that can't be freed until their owners are GC'd.

To that end, we'd like to understand more about how Arrow's memory management works.
My understanding is that the arrow library uses netty for the allocation and the NettyAllocationManager internally uses java.nio.DirectByteBuffer and performs some pooling.
When we call VectorSchemaRoot::close, will the underlying direct memory buffers owned by the root be immediately freed?

On the write path, we batch incoming VSRs into a larger VSR that we allocate and use as a buffer. We append the smaller VSRs to this buffer via VectorSchemaRootAppender::append and then immediately close the smaller VSRs. We're wondering if this extra allocation can result in us using extra memory if the buffers held by the smaller VSRs aren't immediately freed when we call close.

For context, the arrow version we are using is 7.0.0. Thank you for your help!

Sincerely,
Hrishi Dharam



Re: Buffer close - when is underlying memory freed?

Posted by David Li <li...@apache.org>.
Hi Hrishi,

In general, closing an Arrow/Java object will decrement the reference counts on the underlying buffers associated with the object. This may not immediately release them, if there are other outstanding references to the buffers somehow.

Assuming there are no other references, what happens next depends on the buffer size and configuration. By default, we in turn just decrement the Netty buffer's reference count. It's then up to Netty's allocator what to do with the buffer. As you pointed out, Netty uses an arena. So probably the underlying memory won't be freed immediately. (I'd have to dig more into Netty to say what exactly happens. From a quick glance, it appears each thread can maintain its own cache, so that may also be a factor, if your application creates lots of threads.)

It appears Netty does its best to immediately free DirectByteBuffers. (You can see this in PlatformDependent.java in the Netty sources.) But if for whatever reason it can't poke the right JDK internals, then IIRC everything gets serialized on the ReferenceHandler thread, and so an application with a lot of allocation activity can 'outrun' the ability of Netty/Arrow to actually release buffers. (Though pooling is supposed to reduce the chances of this happening.)

There is a knob that can be set [1] to make allocations above a certain threshold use the sun.misc.Unsafe APIs instead. This will effectively free the buffer as soon as the reference count reaches 0, because we are bypassing Netty in that case and basically directly calling malloc/free. But by default the threshold is set to Integer.MAX_VALUE so this will never happen. (Netty itself also makes a similar optimization, for what it's worth - but I wasn't able to figure out the threshold from a quick trawl of the sources.)

Reference: NettyAllocationManager.java [2] and PooledByteBufAllocatorL.java [3] in the Arrow sources

[1]: https://github.com/apache/arrow/blob/11647857b2860973d4b99cd5d9e7132010089469/java/memory/memory-netty/src/test/java/org/apache/arrow/memory/TestNettyAllocationManager.java#L35-L46
[2]: https://github.com/apache/arrow/blob/master/java/memory/memory-netty/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java
[3]: https://github.com/apache/arrow/blob/master/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java

-David

On Mon, Oct 31, 2022, at 12:55, Hrishi Dharam wrote:
> Hi Arrow community,
>  
> My team operates an arrow Flight service implemented in java and we also do additional allocations in the service via Arrow’s BufferAllocator API .
> We’ve been recently struggling with OOM issues for both heap and off-heap memory.
> One potential issue we’ve noticed from a heap dump is that we have java DirectByteBuffer objects that are eligible for GC that hang onto a significant volume of underlying direct memory buffers that can’t be freed until their owners are GC’d.
>  
> To that end, we’d like to understand more about how Arrow’s memory management works.
> My understanding is that the arrow library uses netty for the allocation and the NettyAllocationManager internally uses java.nio.DirectByteBuffer and performs some pooling.
> When we call VectorSchemaRoot::close, will the underlying direct memory buffers owned by the root be immediately freed?
>  
> On the write path, we batch incoming VSRs into a larger VSR that we allocate and use as a buffer. We append the smaller VSRs to this buffer via VectorSchemaRootAppender::append and then immediately close the smaller VSRs. We’re wondering if this extra allocation can result in us using extra memory if the buffers held by the smaller VSRs aren’t immediately freed when we call close.
>  
> For context, the arrow version we are using is 7.0.0. Thank you for your help!
>  
> Sincerely,
> Hrishi Dharam
>  
>