You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Joris Peeters <jo...@gmail.com> on 2021/01/29 13:07:13 UTC

java arrow: memory management with multiple references to same batch

Hello,

I'm writing an HTTP server in Java that provides Arrow data to users. For
performance, I keep the most-recently-used Arrow batches in an in-memory
cache. A batch is wrapped in a "DataBatch" Java object containing the
schema and field vectors.

I'm looking for a good memory management strategy here, given the situation
that,
- batches can be evicted the in-memory cache, and the underlying memory
should be cleared up as quickly as possible, *if nothing else is using them*
,
- data retrieved from the cache undergoes a zero-copy path with filters etc
(which are views on the underlying data) before being sent out, so it can
still be in use when it gets cache-evicted, as there are multiple
simultaneous threads.

I'm used to C++, where this scenario would seem relatively unchallenging,
as we'd keep std::shared_ptr's and just clean up everything in the
destructor.

In Java, however, it seems that,
- Object#finalize is deprecated, and not super-reliable anyway,
- GC might only happen when there is pressure on the Java heap, but the
Arrow data is allocated in Netty buffers.

I wonder if people have encountered this scenario before, and what approach
was favoured. Some ideas,
- Manually maintain a ref-count and free when it goes to zero. This seems
brittle in the face of errors etc, that could lead to leaks,
- Use the PhantomReference mechanism. Would appear to suffer from the same
potential delay in GC, though, i.e. my Java object is just a little holder
for the underlying FieldVectors.  Perhaps there's a way of saying that
these DataBatch object should be GC'd often?
- Make a copy of the data when it gets retrieved from the cache, so an
eviction from the cache means it can always be safely removed. Seems very
wasteful, and not very scalable if there are other reusal paths.
- Allocate the buffers in a way that counts towards heap memory pressure.

Any thoughts are appreciated! I'm not a Java expert at all, so may be
missing obvious things, or thinking about it non-idiomatically.

Best,
-J

Re: java arrow: memory management with multiple references to same batch

Posted by Steve Kim <ch...@gmail.com>.
I recommend that you direct these questions to user@arrow.apache.org
(https://mail-archives.apache.org/mod_mbox/arrow-user/).


On Fri, Jan 29, 2021 at 7:07 AM Joris Peeters
<jo...@gmail.com> wrote:
>
> Hello,
>
> I'm writing an HTTP server in Java that provides Arrow data to users. For
> performance, I keep the most-recently-used Arrow batches in an in-memory
> cache. A batch is wrapped in a "DataBatch" Java object containing the
> schema and field vectors.
>
> I'm looking for a good memory management strategy here, given the situation
> that,
> - batches can be evicted the in-memory cache, and the underlying memory
> should be cleared up as quickly as possible, *if nothing else is using them*
> ,
> - data retrieved from the cache undergoes a zero-copy path with filters etc
> (which are views on the underlying data) before being sent out, so it can
> still be in use when it gets cache-evicted, as there are multiple
> simultaneous threads.
>
> I'm used to C++, where this scenario would seem relatively unchallenging,
> as we'd keep std::shared_ptr's and just clean up everything in the
> destructor.
>
> In Java, however, it seems that,
> - Object#finalize is deprecated, and not super-reliable anyway,
> - GC might only happen when there is pressure on the Java heap, but the
> Arrow data is allocated in Netty buffers.
>
> I wonder if people have encountered this scenario before, and what approach
> was favoured. Some ideas,
> - Manually maintain a ref-count and free when it goes to zero. This seems
> brittle in the face of errors etc, that could lead to leaks,
> - Use the PhantomReference mechanism. Would appear to suffer from the same
> potential delay in GC, though, i.e. my Java object is just a little holder
> for the underlying FieldVectors.  Perhaps there's a way of saying that
> these DataBatch object should be GC'd often?
> - Make a copy of the data when it gets retrieved from the cache, so an
> eviction from the cache means it can always be safely removed. Seems very
> wasteful, and not very scalable if there are other reusal paths.
> - Allocate the buffers in a way that counts towards heap memory pressure.
>
> Any thoughts are appreciated! I'm not a Java expert at all, so may be
> missing obvious things, or thinking about it non-idiomatically.
>
> Best,
> -J

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Hongze Zhang <no...@126.com>.
Hi Jacques,

Thanks for sharing the detailed thoughts on this. Let me try addressing them as following, with
probably re-prioritized order:

> Clearly this patch was driven by an implicit set of needs but it's hard to
> guess at what they are.

The issue the patch tries to solve is pretty simple: To avoid/monitor Java Arrow memory leakage (and
sometimes, use-after-free). Leakage will not take place when we release the buffers correctly, but
relying on GC provides a easier way to stay away from leakage. (Although it brings overheads)

At this time I think relying on GC does not necessarily indicate "release buffers via GC" only, it
could be "leveraging GC to monitor on unreleased buffers" too.

> As far as I can tell, a new allocation manager could do auto-releases.
> Why go beyond introducing a new allocation manager (or even a backstopping
> allocation manager facade). The allocation manager was actually built to
> solve for things like memory moving between C and java.

As a extension point, The AllocationManager API was, in a way, a little bit to fat. It manages both
the underlying memory chunk and allocator-buffer mappings. However when it comes to auto-release, we
don't have to make changes on chunks themselves. We only add some GC callbacks on them and that is
enough. So that dividing AllocationManager to e.g. MemoryChunkManager and MemoryChunk would make the
code cleaner, and would make it easier to reuse the current chunk allocation codes, Netty, Unsafe,
even JNI-shared chunks, etc. Also, in most cases of extending AllocationManager, it's not necessary
for the developers to touch the logic of allocator-buffer mappings. They always customizes how the
chunk is allocated and released only. See a code search[1] for "extends AllocationManager" on
GitHub. If customizations for the manager is really needed, we still provide a factory interface for
MemoryChunkManager, which is previously, AllocationManager.

> It isn't clear here how much this is proposed as a backstop mechanism
> versus a primary release mechanism. I'm more in support of an optional
> backstop mechanism so that production apps don't leak (with appropriate
> logged warnings, etc).

I would like to make it a backstop mechanism. User must configure explicitly to enable it. Otherwise
the legacy method is used. Actually personally I don't feel strong also about having a mode that
relying on GC entriely (which makes #release() a no-op, the mode Mode#GC_ONLY[2]). But a hybrid
release mode (manual + auto + warn logs) will help a lot, sometimes even in production.

> Lastly, I agree with Micah on the library use confusion. If we have close()
> methods everywhere and they javadoc they are required to release memory but
> then we implement something that makes that possibly not true (or not true
> in some contexts), wowser on the confusion.

Similarly, we can avoid providing a pure-GC mode. So that "#release()", "#close()" will still work
as expected at anytime. GC can get involved once a buffer is unused but not closed properly (which
is actually memory leak). This logic has been already implemented in Mode#HYBRID and
Mode#HYBRID_WITH_LOG[3].


By the way, based on the current information, I think another thing that can really matter is how we
define/name this whole functionality. I think it might be good to move from current "GC-based
release strategy" concept, to something like "memory leak detector", or "memory leakage handler".
Say we may have "MemoryChunkLeakHandler" as a new implementation of "MemoryChunkManager" rather than
"MemoryChunkCleaner". We can treat GC just as one method to manage possible leakage, but not express
it as a core technique the allocator system relies on. As long as we avoid providing a mode to
"disable" manual-release at all.


Thanks,
Hongze



[1] https://github.com/search?p=1&q=%22extends+AllocationManager%22&type=Code
[2] 
https://github.com/zhztheplayer/arrow-1/blob/036d7bb27b1eefa338ff295aa0dcb28197ac3355/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L504-L510
[3] 
https://github.com/zhztheplayer/arrow-1/blob/036d7bb27b1eefa338ff295aa0dcb28197ac3355/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L512-L523


On Thu, 2021-10-07 at 16:40 -0700, Jacques Nadeau wrote:
> Clearly this patch was driven by an implicit set of needs but it's hard to
> guess at what they are. As Laurent asked, what is the main goal here? There
> may be many ways to solve this goal.
> 
> Some thoughts in general:
> - The allocator is a finely tuned piece of multithreaded machinery that is
> used on hundreds of thousands of vm's ever year. As such, refactors should
> be avoided without a clear pressing need. When a need exists, it should be
> discussed extensively and thoroughly designed/vetted before proposing any
> patches.
> - Having used the library extensively, I can state that the library is
> already too heap impacting given the nature of the objects. Increasing GC
> load/reliance is actually in direct conflict to many of the past
> optimizations of the allocator and vector objects.
> - As far as I can tell, a new allocation manager could do auto-releases.
> Why go beyond introducing a new allocation manager (or even a backstopping
> allocation manager facade). The allocation manager was actually built to
> solve for things like memory moving between C and java.
> - It isn't clear here how much this is proposed as a backstop mechanism
> versus a primary release mechanism. I'm more in support of an optional
> backstop mechanism so that production apps don't leak (with appropriate
> logged warnings, etc).
> - Relying on GC to release direct memory frequently leads to early OOMs
> (exactly why the release mechanism was in place and is chosen for high perf
> apps like Netty and Arrow). It is fairly easy to allocate a bunch of direct
> memory but the GC never kicks in because heap memory is not under load.
> 
> Lastly, I agree with Micah on the library use confusion. If we have close()
> methods everywhere and they javadoc they are required to release memory but
> then we implement something that makes that possibly not true (or not true
> in some contexts), wowser on the confusion.
> 
> 
> 
> 
> On Thu, Oct 7, 2021 at 3:55 PM Micah Kornfield <em...@gmail.com>
> wrote:
> 
> > In addition to the points raised by Laurent, I'm concerned about having two
> > possible idioms in the same library.  It means code written against the
> > library becomes less portable (you need to know how the memory allocator is
> > using GC or not).
> > 
> > I understand manual memory management in Java is tedious but is there a
> > specific problem this is addressing other than making Arrow have more
> > expected semantics to Java users?  Are there maybe alternative solutions to
> > addressing the problem?
> > 
> > 
> > -Micah
> > 
> > 
> > 
> > On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com> wrote:
> > 
> > > We don't have to concern about that since no difference will be made on
> > > current manual release path unless "MemoryChunkCleaner" is explicitly
> > > specified when creating BufferAllocators. Manual ref counting will be
> > only
> > > discard/ignored in "MemoryChunkCleaner", while by default
> > > "MemoryChunkManager" is used which is totally the same as current.
> > > 
> > > 
> > > 
> > > 
> > > For example, to enable GC-based functionalities, the BufferAllocator has
> > > to be created using the following way:
> > > 
> > > 
> > > 
> > > 
> > > BufferAllocator root = new RootAllocator(
> > > 
> > >         BaseAllocator.configBuilder()
> > > 
> > >             .maxAllocation(MAX_ALLOCATION)
> > > 
> > >             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory())
> > //
> > > to enable GC-based buffer release
> > > 
> > >             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM
> > GC
> > > when allocator is full
> > > 
> > >             .build());
> > > 
> > > Thanks,
> > > 
> > > Hongze
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
> > > > Unless I missed something in the big picture, but it seems that using a
> > GC
> > > > based approach would make things much inefficient in terms of memory
> > > > management and would cause probably quite the confusion for all the
> > > systems
> > > > out there which rely on refcounting for keeping things in check, I'm not
> > > > sure why changing the default is such a good idea...
> > > > 
> > > > On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com> wrote:
> > > > 
> > > > > Hi Laurent,
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Sorry I might describe it unclearly and yes, the purpose is
> > > > > straightforward: to discard (in another strategy rather than default)
> > > the
> > > > > usage of Java buffer ref counting and rely on GC to do cleanup for the
> > > > > buffers. Though some design changes to Allocator APIs maybe required
> > to
> > > > > achieve this, for example, the decoupling of MemoryChunkAllocator from
> > > > > AllocationManager.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Should we make more discussion on it, or I can open a ticket asap?
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Hongze
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I gave a quick look at your patches but to be honest, it's not easy
> > to
> > > > > > figure out the problem you're trying to solve in the first place. Or
> > > is it
> > > > > > simply to discard the use of ref counting to track buffer usages and
> > > > > > relying on Java references to keep track of buffers at the expense of
> > > > > > creating more pressure on the GC itself to collect and free buffers?
> > > > > > 
> > > > > > On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com>
> > > wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I would like to discuss on the potential of introducing a GC-based
> > > > > > > reference management strategy to Arrow Java, and we
> > > > > > > have already been working on an implementation in our own project.
> > I
> > > > > have
> > > > > > > put the related codes in following branch and
> > > > > > > if it makes sense to upstream Apache Arrow I can open a PR for it:
> > > > > > > 
> > > > > > > 
> > > > > > > https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > > > > > > 
> > > > > > > In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > > > > > > 
> > > > > > > Commit 1: To break AllocationManager to two components:
> > > > > > > MemoryChunkManager[1] which maintains BufferLedger-
> > > > > > > BufferAllocator mappings, and MemoryChunkAllocator[2] which
> > performs
> > > the
> > > > > > > underlying allocate/destory operations. The
> > > > > > > previous customizations such as NettyAllocationManager,
> > > > > > > UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> > > > > > > as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> > > > > > > 
> > > > > > > Commit 2: To introduce a new implementation of MemoryChunkManager,
> > > > > > > MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > > > > > > converts all managed chunks to WeakReferences, and a global thread
> > > will
> > > > > > > observe on the assigned reference queue to
> > > > > > > release the unused garbage chunks which are enqueued by JVM GC.
> > Some
> > > > > > > modes[4] are there to provide different strategies,
> > > > > > > such as to disable manual reference management (default), or to
> > > enable
> > > > > > > manual and GC-based reference management at the
> > > > > > > same time, or to report leaks only.
> > > > > > > 
> > > > > > > Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > > > > BufferAllocator.
> > > > > > > This makes some special workloads, e.g.
> > > > > > > cross-JNI buffer sharing much easier as we will no longer need an
> > > > > > > Allocator directly binding to the shared memory
> > > > > > > chunks, and will still be able to leverage all of Allocator's
> > > advantages
> > > > > > > to manage the chunks (E.g. to use
> > > > > > > MemoryChunkCleaner).
> > > > > > > 
> > > > > > > Some possible Q&As about this implementation:
> > > > > > > 
> > > > > > > 1. Why adding MemoryChunkAllocator, rather than just customizing
> > > > > > > AllocationManager?
> > > > > > > It is to reuse all of current underlying chunk allocation
> > strategies,
> > > > > > > Netty, Unsafe, and others. Also, with the layer of
> > > > > > > MemoryChunk separated from AllocationManager we will be able to
> > > create
> > > > > > > MemoryChunks actively in some special cases, e.g.
> > > > > > > cross-JNI buffer sharing, C data interface buffer importing, then
> > > > > populate
> > > > > > > the chunks to a manager BufferAllocator.
> > > > > > > 
> > > > > > > 2. Why using WeakReference rather than PhantomReference?
> > > > > > > In Java 8, PhantomReference has some issue against its referent
> > > object.
> > > > > > > The object cannot be garbage collected before
> > > > > > > being enqueued. In WeakReference we don't have this issue. See
> > > ref[5].
> > > > > > > 3. Why not sun.misc.Cleaner?
> > > > > > > Cleaner API maintains a global doubly linked list to keep the
> > Cleaner
> > > > > > > instances alive. This brings overheads to us since
> > > > > > > we will create local doubly linked list for cleaning up all the
> > > buffers
> > > > > on
> > > > > > > Allocator close. See a unit test[6].
> > > > > > > 
> > > > > > > 4. Why closing all buffers on Allocator close?
> > > > > > > This behavior can be customizabale within Mode[7]s. A principle is,
> > > when
> > > > > > > relying on GC, we should allow closing buffers
> > > > > > > manually, or at least closing all of them on Allocator close. Or
> > the
> > > > > > > actual release time of the underlying chunks will
> > > > > > > be unpredictable.
> > > > > > > 
> > > > > > > 5. Can we use heap based buffers?
> > > > > > > If I am not wrong, no. The heap objects can be physically moved
> > > around
> > > > > by
> > > > > > > JVM. The addresses can vary.
> > > > > > > 
> > > > > > > 6. When should GC happen?
> > > > > > > A new AllocationListener implementation, GCTriger[8] is introduced.
> > > GC
> > > > > > > will be performed when BufferAllocator is full.
> > > > > > > 
> > > > > > > 7. Performance?
> > > > > > > Based on previous measurement, it doesn't bring overheads on the
> > > legacy
> > > > > > > path (by using default MemoryChunkManager). For
> > > > > > > GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can
> > be
> > > > > > > comparatively slower due to higher GC pressure
> > > > > > > (For example, in out Arrow-based query engine, to run TPC-DS
> > SF1000,
> > > the
> > > > > > > overhead can be up to 3%). I can collect more
> > > > > > > benchmark results in future, maybe under a JIRA ticket or a PR.
> > > > > > > 
> > > > > > > 8. Breaking changes?
> > > > > > > It doesn't break library-based allocation strategy selection like
> > > > > directly
> > > > > > > including arrow-memory-unsafe.jar to
> > > > > > > projects. However it breaks use of the previous
> > > > > AllocationManager.Factory
> > > > > > > API. Users should migrate to
> > > > > > > MemoryChunkAllocator API. Which in my opinion is simple to do.
> > > > > > > 
> > > > > > > The implementation is still evolving, so if the links get
> > out-of-date
> > > > > you
> > > > > > > can check on the newest branch head. Any
> > > > > > > suggestions welcome.
> > > > > > > 
> > > > > > > Some other discussions that may be related to this topic:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > > > > > > 
> > > > > > > 
> > https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > > > > > > Best,
> > > > > > > Hongze
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > [1]
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > > > > > > MemoryChunkManager.java
> > > > > > > [2]
> > > > > > > 
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > > > > > > [3]
> > > > > > > 
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > > > > > > [4]
> > > > > > > 
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > > [5]
> > > > > > > 
> > > > > > > 
> > https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > > > > > > [6]
> > > > > > > 
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > > > > > > [7]
> > > > > > > 
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > > [8]
> > > > > > > 
> > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > > > > > > 
> > > > > > > 


Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Jacques Nadeau <ja...@apache.org>.
Hongze, I don't realistically have time to iterate with you on this. I
agree that there could be easier apis to support leak detection. That being
said, just because things could be refactored to make something easier
isn't always a good enough reason to refactor them. I wonder if you can do
the first version on top of a an allocation manager. For example, create a
new LeakDetectingNettyAllocationManager and prove the value there before
refactoring the core allocator system?

Either way, I hope that others on the list can continue to iterate with you
on this.

On Tue, Oct 12, 2021 at 9:01 PM Hongze Zhang <no...@126.com> wrote:

> Thanks for sharing the information Laurent.
>
> I'd think JDK will always provide options to leverage garbage collector
> when it comes to cleaning up external resources
> (like native memory) binding to a object. In panama, as you can see, the
> new MemorySegment interfaces still allow users
> to register GC-hooks to do cleanup. See "Implicit deallocation" chapter in
> this documentation[1].
>
> By the way, in Arrow things can be a little bit more complicated, since
> the allocations has a local scope which is
> BufferAllocator, but JDK always uses global direct memory scope, even in
> panama.
>
> Hongze
>
> [1]
> https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemorySegment.html
>
>
>
> On Thu, 2021-10-07 at 21:40 -0700, Laurent Goujon wrote:
> > Worth also observing that the Java community has been recognizant of the
> > issues with the current approach around direct byte buffer and GC, and
> has
> > come up with a proposal/implementation to avoid interaction with GC and
> > give more direct control. The proposal (actually, its 3rd iteration) is
> > described here at https://openjdk.java.net/jeps/393, and has been
> available
> > as an incubator feature for several JDK releases (Javadoc:
> >
> https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/package-summary.html
> > )
> >
> > Laurent
> >
> >
> > On Thu, Oct 7, 2021 at 4:40 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> > > Clearly this patch was driven by an implicit set of needs but it's
> hard to
> > > guess at what they are. As Laurent asked, what is the main goal here?
> There
> > > may be many ways to solve this goal.
> > >
> > > Some thoughts in general:
> > > - The allocator is a finely tuned piece of multithreaded machinery
> that is
> > > used on hundreds of thousands of vm's ever year. As such, refactors
> should
> > > be avoided without a clear pressing need. When a need exists, it
> should be
> > > discussed extensively and thoroughly designed/vetted before proposing
> any
> > > patches.
> > > - Having used the library extensively, I can state that the library is
> > > already too heap impacting given the nature of the objects. Increasing
> GC
> > > load/reliance is actually in direct conflict to many of the past
> > > optimizations of the allocator and vector objects.
> > > - As far as I can tell, a new allocation manager could do
> auto-releases.
> > > Why go beyond introducing a new allocation manager (or even a
> backstopping
> > > allocation manager facade). The allocation manager was actually built
> to
> > > solve for things like memory moving between C and java.
> > > - It isn't clear here how much this is proposed as a backstop mechanism
> > > versus a primary release mechanism. I'm more in support of an optional
> > > backstop mechanism so that production apps don't leak (with appropriate
> > > logged warnings, etc).
> > > - Relying on GC to release direct memory frequently leads to early OOMs
> > > (exactly why the release mechanism was in place and is chosen for high
> perf
> > > apps like Netty and Arrow). It is fairly easy to allocate a bunch of
> direct
> > > memory but the GC never kicks in because heap memory is not under load.
> > >
> > > Lastly, I agree with Micah on the library use confusion. If we have
> close()
> > > methods everywhere and they javadoc they are required to release
> memory but
> > > then we implement something that makes that possibly not true (or not
> true
> > > in some contexts), wowser on the confusion.
> > >
> > >
> > >
> > >
> > > On Thu, Oct 7, 2021 at 3:55 PM Micah Kornfield <em...@gmail.com>
> > > wrote:
> > >
> > > > In addition to the points raised by Laurent, I'm concerned about
> having
> > > two
> > > > possible idioms in the same library.  It means code written against
> the
> > > > library becomes less portable (you need to know how the memory
> allocator
> > > is
> > > > using GC or not).
> > > >
> > > > I understand manual memory management in Java is tedious but is
> there a
> > > > specific problem this is addressing other than making Arrow have more
> > > > expected semantics to Java users?  Are there maybe alternative
> solutions
> > > to
> > > > addressing the problem?
> > > >
> > > >
> > > > -Micah
> > > >
> > > >
> > > >
> > > > On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com>
> wrote:
> > > >
> > > > > We don't have to concern about that since no difference will be
> made on
> > > > > current manual release path unless "MemoryChunkCleaner" is
> explicitly
> > > > > specified when creating BufferAllocators. Manual ref counting will
> be
> > > > only
> > > > > discard/ignored in "MemoryChunkCleaner", while by default
> > > > > "MemoryChunkManager" is used which is totally the same as current.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > For example, to enable GC-based functionalities, the
> BufferAllocator
> > > has
> > > > > to be created using the following way:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > BufferAllocator root = new RootAllocator(
> > > > >
> > > > >         BaseAllocator.configBuilder()
> > > > >
> > > > >             .maxAllocation(MAX_ALLOCATION)
> > > > >
> > > > >
> .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory())
> > > > //
> > > > > to enable GC-based buffer release
> > > > >
> > > > >             .listener(MemoryChunkCleaner.gcTrigger()) // to
> trigger JVM
> > > > GC
> > > > > when allocator is full
> > > > >
> > > > >             .build());
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Hongze
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com>
> wrote:
> > > > > > Unless I missed something in the big picture, but it seems that
> using
> > > a
> > > > GC
> > > > > > based approach would make things much inefficient in terms of
> memory
> > > > > > management and would cause probably quite the confusion for all
> the
> > > > > systems
> > > > > > out there which rely on refcounting for keeping things in check,
> I'm
> > > not
> > > > > > sure why changing the default is such a good idea...
> > > > > >
> > > > > > On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Laurent,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Sorry I might describe it unclearly and yes, the purpose is
> > > > > > > straightforward: to discard (in another strategy rather than
> > > default)
> > > > > the
> > > > > > > usage of Java buffer ref counting and rely on GC to do cleanup
> for
> > > the
> > > > > > > buffers. Though some design changes to Allocator APIs maybe
> required
> > > > to
> > > > > > > achieve this, for example, the decoupling of
> MemoryChunkAllocator
> > > from
> > > > > > > AllocationManager.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Should we make more discussion on it, or I can open a ticket
> asap?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Hongze
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com>
> > > wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I gave a quick look at your patches but to be honest, it's
> not easy
> > > > to
> > > > > > > > figure out the problem you're trying to solve in the first
> place.
> > > Or
> > > > > is it
> > > > > > > > simply to discard the use of ref counting to track buffer
> usages
> > > and
> > > > > > > > relying on Java references to keep track of buffers at the
> expense
> > > of
> > > > > > > > creating more pressure on the GC itself to collect and free
> > > buffers?
> > > > > > > >
> > > > > > > > On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <
> notifyzhz@126.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I would like to discuss on the potential of introducing a
> > > GC-based
> > > > > > > > > reference management strategy to Arrow Java, and we
> > > > > > > > > have already been working on an implementation in our own
> > > project.
> > > > I
> > > > > > > have
> > > > > > > > > put the related codes in following branch and
> > > > > > > > > if it makes sense to upstream Apache Arrow I can open a PR
> for
> > > it:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > > > > > > > >
> > > > > > > > > In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > > > > > > > >
> > > > > > > > > Commit 1: To break AllocationManager to two components:
> > > > > > > > > MemoryChunkManager[1] which maintains BufferLedger-
> > > > > > > > > BufferAllocator mappings, and MemoryChunkAllocator[2] which
> > > > performs
> > > > > the
> > > > > > > > > underlying allocate/destory operations. The
> > > > > > > > > previous customizations such as NettyAllocationManager,
> > > > > > > > > UnsafeAllocationManager, are moved to MemoryChunkAllocator
> API,
> > > > > > > > > as NettyMemoryChunkAllocator and
> UnsafeMemoryChunkAllocator.
> > > > > > > > >
> > > > > > > > > Commit 2: To introduce a new implementation of
> > > MemoryChunkManager,
> > > > > > > > > MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > > > > > > > > converts all managed chunks to WeakReferences, and a global
> > > thread
> > > > > will
> > > > > > > > > observe on the assigned reference queue to
> > > > > > > > > release the unused garbage chunks which are enqueued by
> JVM GC.
> > > > Some
> > > > > > > > > modes[4] are there to provide different strategies,
> > > > > > > > > such as to disable manual reference management (default),
> or to
> > > > > enable
> > > > > > > > > manual and GC-based reference management at the
> > > > > > > > > same time, or to report leaks only.
> > > > > > > > >
> > > > > > > > > Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > > > > > > BufferAllocator.
> > > > > > > > > This makes some special workloads, e.g.
> > > > > > > > > cross-JNI buffer sharing much easier as we will no longer
> need an
> > > > > > > > > Allocator directly binding to the shared memory
> > > > > > > > > chunks, and will still be able to leverage all of
> Allocator's
> > > > > advantages
> > > > > > > > > to manage the chunks (E.g. to use
> > > > > > > > > MemoryChunkCleaner).
> > > > > > > > >
> > > > > > > > > Some possible Q&As about this implementation:
> > > > > > > > >
> > > > > > > > > 1. Why adding MemoryChunkAllocator, rather than just
> customizing
> > > > > > > > > AllocationManager?
> > > > > > > > > It is to reuse all of current underlying chunk allocation
> > > > strategies,
> > > > > > > > > Netty, Unsafe, and others. Also, with the layer of
> > > > > > > > > MemoryChunk separated from AllocationManager we will be
> able to
> > > > > create
> > > > > > > > > MemoryChunks actively in some special cases, e.g.
> > > > > > > > > cross-JNI buffer sharing, C data interface buffer
> importing, then
> > > > > > > populate
> > > > > > > > > the chunks to a manager BufferAllocator.
> > > > > > > > >
> > > > > > > > > 2. Why using WeakReference rather than PhantomReference?
> > > > > > > > > In Java 8, PhantomReference has some issue against its
> referent
> > > > > object.
> > > > > > > > > The object cannot be garbage collected before
> > > > > > > > > being enqueued. In WeakReference we don't have this issue.
> See
> > > > > ref[5].
> > > > > > > > >
> > > > > > > > > 3. Why not sun.misc.Cleaner?
> > > > > > > > > Cleaner API maintains a global doubly linked list to keep
> the
> > > > Cleaner
> > > > > > > > > instances alive. This brings overheads to us since
> > > > > > > > > we will create local doubly linked list for cleaning up
> all the
> > > > > buffers
> > > > > > > on
> > > > > > > > > Allocator close. See a unit test[6].
> > > > > > > > >
> > > > > > > > > 4. Why closing all buffers on Allocator close?
> > > > > > > > > This behavior can be customizabale within Mode[7]s. A
> principle
> > > is,
> > > > > when
> > > > > > > > > relying on GC, we should allow closing buffers
> > > > > > > > > manually, or at least closing all of them on Allocator
> close. Or
> > > > the
> > > > > > > > > actual release time of the underlying chunks will
> > > > > > > > > be unpredictable.
> > > > > > > > >
> > > > > > > > > 5. Can we use heap based buffers?
> > > > > > > > > If I am not wrong, no. The heap objects can be physically
> moved
> > > > > around
> > > > > > > by
> > > > > > > > > JVM. The addresses can vary.
> > > > > > > > >
> > > > > > > > > 6. When should GC happen?
> > > > > > > > > A new AllocationListener implementation, GCTriger[8] is
> > > introduced.
> > > > > GC
> > > > > > > > > will be performed when BufferAllocator is full.
> > > > > > > > >
> > > > > > > > > 7. Performance?
> > > > > > > > > Based on previous measurement, it doesn't bring overheads
> on the
> > > > > legacy
> > > > > > > > > path (by using default MemoryChunkManager). For
> > > > > > > > > GC-based path (MemoryChunkCleaner + Mode.GC_ONLY),
> allocations
> > > can
> > > > be
> > > > > > > > > comparatively slower due to higher GC pressure
> > > > > > > > > (For example, in out Arrow-based query engine, to run
> TPC-DS
> > > > SF1000,
> > > > > the
> > > > > > > > > overhead can be up to 3%). I can collect more
> > > > > > > > > benchmark results in future, maybe under a JIRA ticket or
> a PR.
> > > > > > > > >
> > > > > > > > > 8. Breaking changes?
> > > > > > > > > It doesn't break library-based allocation strategy
> selection like
> > > > > > > directly
> > > > > > > > > including arrow-memory-unsafe.jar to
> > > > > > > > > projects. However it breaks use of the previous
> > > > > > > AllocationManager.Factory
> > > > > > > > > API. Users should migrate to
> > > > > > > > > MemoryChunkAllocator API. Which in my opinion is simple to
> do.
> > > > > > > > >
> > > > > > > > > The implementation is still evolving, so if the links get
> > > > out-of-date
> > > > > > > you
> > > > > > > > > can check on the newest branch head. Any
> > > > > > > > > suggestions welcome.
> > > > > > > > >
> > > > > > > > > Some other discussions that may be related to this topic:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Hongze
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > > > > > > > > MemoryChunkManager.java
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > > > > > > > >
> > > > > > > > > [3]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > > > > > > > > [4]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > > > > [5]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > > > > > > > > [6]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > > > > > > > > [7]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > > > > [8]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
>
>
>

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Hongze Zhang <no...@126.com>.
Thanks for sharing the information Laurent.

I'd think JDK will always provide options to leverage garbage collector when it comes to cleaning up external resources
(like native memory) binding to a object. In panama, as you can see, the new MemorySegment interfaces still allow users
to register GC-hooks to do cleanup. See "Implicit deallocation" chapter in this documentation[1].

By the way, in Arrow things can be a little bit more complicated, since the allocations has a local scope which is
BufferAllocator, but JDK always uses global direct memory scope, even in panama.

Hongze

[1] https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemorySegment.html



On Thu, 2021-10-07 at 21:40 -0700, Laurent Goujon wrote:
> Worth also observing that the Java community has been recognizant of the
> issues with the current approach around direct byte buffer and GC, and has
> come up with a proposal/implementation to avoid interaction with GC and
> give more direct control. The proposal (actually, its 3rd iteration) is
> described here at https://openjdk.java.net/jeps/393, and has been available
> as an incubator feature for several JDK releases (Javadoc:
> https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/package-summary.html
> )
> 
> Laurent
> 
> 
> On Thu, Oct 7, 2021 at 4:40 PM Jacques Nadeau <ja...@apache.org> wrote:
> 
> > Clearly this patch was driven by an implicit set of needs but it's hard to
> > guess at what they are. As Laurent asked, what is the main goal here? There
> > may be many ways to solve this goal.
> > 
> > Some thoughts in general:
> > - The allocator is a finely tuned piece of multithreaded machinery that is
> > used on hundreds of thousands of vm's ever year. As such, refactors should
> > be avoided without a clear pressing need. When a need exists, it should be
> > discussed extensively and thoroughly designed/vetted before proposing any
> > patches.
> > - Having used the library extensively, I can state that the library is
> > already too heap impacting given the nature of the objects. Increasing GC
> > load/reliance is actually in direct conflict to many of the past
> > optimizations of the allocator and vector objects.
> > - As far as I can tell, a new allocation manager could do auto-releases.
> > Why go beyond introducing a new allocation manager (or even a backstopping
> > allocation manager facade). The allocation manager was actually built to
> > solve for things like memory moving between C and java.
> > - It isn't clear here how much this is proposed as a backstop mechanism
> > versus a primary release mechanism. I'm more in support of an optional
> > backstop mechanism so that production apps don't leak (with appropriate
> > logged warnings, etc).
> > - Relying on GC to release direct memory frequently leads to early OOMs
> > (exactly why the release mechanism was in place and is chosen for high perf
> > apps like Netty and Arrow). It is fairly easy to allocate a bunch of direct
> > memory but the GC never kicks in because heap memory is not under load.
> > 
> > Lastly, I agree with Micah on the library use confusion. If we have close()
> > methods everywhere and they javadoc they are required to release memory but
> > then we implement something that makes that possibly not true (or not true
> > in some contexts), wowser on the confusion.
> > 
> > 
> > 
> > 
> > On Thu, Oct 7, 2021 at 3:55 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> > 
> > > In addition to the points raised by Laurent, I'm concerned about having
> > two
> > > possible idioms in the same library.  It means code written against the
> > > library becomes less portable (you need to know how the memory allocator
> > is
> > > using GC or not).
> > > 
> > > I understand manual memory management in Java is tedious but is there a
> > > specific problem this is addressing other than making Arrow have more
> > > expected semantics to Java users?  Are there maybe alternative solutions
> > to
> > > addressing the problem?
> > > 
> > > 
> > > -Micah
> > > 
> > > 
> > > 
> > > On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com> wrote:
> > > 
> > > > We don't have to concern about that since no difference will be made on
> > > > current manual release path unless "MemoryChunkCleaner" is explicitly
> > > > specified when creating BufferAllocators. Manual ref counting will be
> > > only
> > > > discard/ignored in "MemoryChunkCleaner", while by default
> > > > "MemoryChunkManager" is used which is totally the same as current.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > For example, to enable GC-based functionalities, the BufferAllocator
> > has
> > > > to be created using the following way:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > BufferAllocator root = new RootAllocator(
> > > > 
> > > >         BaseAllocator.configBuilder()
> > > > 
> > > >             .maxAllocation(MAX_ALLOCATION)
> > > > 
> > > >             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory())
> > > //
> > > > to enable GC-based buffer release
> > > > 
> > > >             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM
> > > GC
> > > > when allocator is full
> > > > 
> > > >             .build());
> > > > 
> > > > Thanks,
> > > > 
> > > > Hongze
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
> > > > > Unless I missed something in the big picture, but it seems that using
> > a
> > > GC
> > > > > based approach would make things much inefficient in terms of memory
> > > > > management and would cause probably quite the confusion for all the
> > > > systems
> > > > > out there which rely on refcounting for keeping things in check, I'm
> > not
> > > > > sure why changing the default is such a good idea...
> > > > > 
> > > > > On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com>
> > wrote:
> > > > > 
> > > > > > Hi Laurent,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Sorry I might describe it unclearly and yes, the purpose is
> > > > > > straightforward: to discard (in another strategy rather than
> > default)
> > > > the
> > > > > > usage of Java buffer ref counting and rely on GC to do cleanup for
> > the
> > > > > > buffers. Though some design changes to Allocator APIs maybe required
> > > to
> > > > > > achieve this, for example, the decoupling of MemoryChunkAllocator
> > from
> > > > > > AllocationManager.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Should we make more discussion on it, or I can open a ticket asap?
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > Hongze
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com>
> > wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I gave a quick look at your patches but to be honest, it's not easy
> > > to
> > > > > > > figure out the problem you're trying to solve in the first place.
> > Or
> > > > is it
> > > > > > > simply to discard the use of ref counting to track buffer usages
> > and
> > > > > > > relying on Java references to keep track of buffers at the expense
> > of
> > > > > > > creating more pressure on the GC itself to collect and free
> > buffers?
> > > > > > > 
> > > > > > > On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com>
> > > > wrote:
> > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > I would like to discuss on the potential of introducing a
> > GC-based
> > > > > > > > reference management strategy to Arrow Java, and we
> > > > > > > > have already been working on an implementation in our own
> > project.
> > > I
> > > > > > have
> > > > > > > > put the related codes in following branch and
> > > > > > > > if it makes sense to upstream Apache Arrow I can open a PR for
> > it:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > > > > > > > 
> > > > > > > > In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > > > > > > > 
> > > > > > > > Commit 1: To break AllocationManager to two components:
> > > > > > > > MemoryChunkManager[1] which maintains BufferLedger-
> > > > > > > > BufferAllocator mappings, and MemoryChunkAllocator[2] which
> > > performs
> > > > the
> > > > > > > > underlying allocate/destory operations. The
> > > > > > > > previous customizations such as NettyAllocationManager,
> > > > > > > > UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> > > > > > > > as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> > > > > > > > 
> > > > > > > > Commit 2: To introduce a new implementation of
> > MemoryChunkManager,
> > > > > > > > MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > > > > > > > converts all managed chunks to WeakReferences, and a global
> > thread
> > > > will
> > > > > > > > observe on the assigned reference queue to
> > > > > > > > release the unused garbage chunks which are enqueued by JVM GC.
> > > Some
> > > > > > > > modes[4] are there to provide different strategies,
> > > > > > > > such as to disable manual reference management (default), or to
> > > > enable
> > > > > > > > manual and GC-based reference management at the
> > > > > > > > same time, or to report leaks only.
> > > > > > > > 
> > > > > > > > Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > > > > > BufferAllocator.
> > > > > > > > This makes some special workloads, e.g.
> > > > > > > > cross-JNI buffer sharing much easier as we will no longer need an
> > > > > > > > Allocator directly binding to the shared memory
> > > > > > > > chunks, and will still be able to leverage all of Allocator's
> > > > advantages
> > > > > > > > to manage the chunks (E.g. to use
> > > > > > > > MemoryChunkCleaner).
> > > > > > > > 
> > > > > > > > Some possible Q&As about this implementation:
> > > > > > > > 
> > > > > > > > 1. Why adding MemoryChunkAllocator, rather than just customizing
> > > > > > > > AllocationManager?
> > > > > > > > It is to reuse all of current underlying chunk allocation
> > > strategies,
> > > > > > > > Netty, Unsafe, and others. Also, with the layer of
> > > > > > > > MemoryChunk separated from AllocationManager we will be able to
> > > > create
> > > > > > > > MemoryChunks actively in some special cases, e.g.
> > > > > > > > cross-JNI buffer sharing, C data interface buffer importing, then
> > > > > > populate
> > > > > > > > the chunks to a manager BufferAllocator.
> > > > > > > > 
> > > > > > > > 2. Why using WeakReference rather than PhantomReference?
> > > > > > > > In Java 8, PhantomReference has some issue against its referent
> > > > object.
> > > > > > > > The object cannot be garbage collected before
> > > > > > > > being enqueued. In WeakReference we don't have this issue. See
> > > > ref[5].
> > > > > > > > 
> > > > > > > > 3. Why not sun.misc.Cleaner?
> > > > > > > > Cleaner API maintains a global doubly linked list to keep the
> > > Cleaner
> > > > > > > > instances alive. This brings overheads to us since
> > > > > > > > we will create local doubly linked list for cleaning up all the
> > > > buffers
> > > > > > on
> > > > > > > > Allocator close. See a unit test[6].
> > > > > > > > 
> > > > > > > > 4. Why closing all buffers on Allocator close?
> > > > > > > > This behavior can be customizabale within Mode[7]s. A principle
> > is,
> > > > when
> > > > > > > > relying on GC, we should allow closing buffers
> > > > > > > > manually, or at least closing all of them on Allocator close. Or
> > > the
> > > > > > > > actual release time of the underlying chunks will
> > > > > > > > be unpredictable.
> > > > > > > > 
> > > > > > > > 5. Can we use heap based buffers?
> > > > > > > > If I am not wrong, no. The heap objects can be physically moved
> > > > around
> > > > > > by
> > > > > > > > JVM. The addresses can vary.
> > > > > > > > 
> > > > > > > > 6. When should GC happen?
> > > > > > > > A new AllocationListener implementation, GCTriger[8] is
> > introduced.
> > > > GC
> > > > > > > > will be performed when BufferAllocator is full.
> > > > > > > > 
> > > > > > > > 7. Performance?
> > > > > > > > Based on previous measurement, it doesn't bring overheads on the
> > > > legacy
> > > > > > > > path (by using default MemoryChunkManager). For
> > > > > > > > GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations
> > can
> > > be
> > > > > > > > comparatively slower due to higher GC pressure
> > > > > > > > (For example, in out Arrow-based query engine, to run TPC-DS
> > > SF1000,
> > > > the
> > > > > > > > overhead can be up to 3%). I can collect more
> > > > > > > > benchmark results in future, maybe under a JIRA ticket or a PR.
> > > > > > > > 
> > > > > > > > 8. Breaking changes?
> > > > > > > > It doesn't break library-based allocation strategy selection like
> > > > > > directly
> > > > > > > > including arrow-memory-unsafe.jar to
> > > > > > > > projects. However it breaks use of the previous
> > > > > > AllocationManager.Factory
> > > > > > > > API. Users should migrate to
> > > > > > > > MemoryChunkAllocator API. Which in my opinion is simple to do.
> > > > > > > > 
> > > > > > > > The implementation is still evolving, so if the links get
> > > out-of-date
> > > > > > you
> > > > > > > > can check on the newest branch head. Any
> > > > > > > > suggestions welcome.
> > > > > > > > 
> > > > > > > > Some other discussions that may be related to this topic:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > > > > > > > 
> > > > > > > > Best,
> > > > > > > > Hongze
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > [1]
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > > > > > > > MemoryChunkManager.java
> > > > > > > > [2]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > > > > > > > 
> > > > > > > > [3]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > > > > > > > [4]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > > > [5]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > > > > > > > [6]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > > > > > > > [7]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > > > [8]
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > 



Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Laurent Goujon <la...@dremio.com>.
Worth also observing that the Java community has been recognizant of the
issues with the current approach around direct byte buffer and GC, and has
come up with a proposal/implementation to avoid interaction with GC and
give more direct control. The proposal (actually, its 3rd iteration) is
described here at https://openjdk.java.net/jeps/393, and has been available
as an incubator feature for several JDK releases (Javadoc:
https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/package-summary.html
)

Laurent


On Thu, Oct 7, 2021 at 4:40 PM Jacques Nadeau <ja...@apache.org> wrote:

> Clearly this patch was driven by an implicit set of needs but it's hard to
> guess at what they are. As Laurent asked, what is the main goal here? There
> may be many ways to solve this goal.
>
> Some thoughts in general:
> - The allocator is a finely tuned piece of multithreaded machinery that is
> used on hundreds of thousands of vm's ever year. As such, refactors should
> be avoided without a clear pressing need. When a need exists, it should be
> discussed extensively and thoroughly designed/vetted before proposing any
> patches.
> - Having used the library extensively, I can state that the library is
> already too heap impacting given the nature of the objects. Increasing GC
> load/reliance is actually in direct conflict to many of the past
> optimizations of the allocator and vector objects.
> - As far as I can tell, a new allocation manager could do auto-releases.
> Why go beyond introducing a new allocation manager (or even a backstopping
> allocation manager facade). The allocation manager was actually built to
> solve for things like memory moving between C and java.
> - It isn't clear here how much this is proposed as a backstop mechanism
> versus a primary release mechanism. I'm more in support of an optional
> backstop mechanism so that production apps don't leak (with appropriate
> logged warnings, etc).
> - Relying on GC to release direct memory frequently leads to early OOMs
> (exactly why the release mechanism was in place and is chosen for high perf
> apps like Netty and Arrow). It is fairly easy to allocate a bunch of direct
> memory but the GC never kicks in because heap memory is not under load.
>
> Lastly, I agree with Micah on the library use confusion. If we have close()
> methods everywhere and they javadoc they are required to release memory but
> then we implement something that makes that possibly not true (or not true
> in some contexts), wowser on the confusion.
>
>
>
>
> On Thu, Oct 7, 2021 at 3:55 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > In addition to the points raised by Laurent, I'm concerned about having
> two
> > possible idioms in the same library.  It means code written against the
> > library becomes less portable (you need to know how the memory allocator
> is
> > using GC or not).
> >
> > I understand manual memory management in Java is tedious but is there a
> > specific problem this is addressing other than making Arrow have more
> > expected semantics to Java users?  Are there maybe alternative solutions
> to
> > addressing the problem?
> >
> >
> > -Micah
> >
> >
> >
> > On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com> wrote:
> >
> > > We don't have to concern about that since no difference will be made on
> > > current manual release path unless "MemoryChunkCleaner" is explicitly
> > > specified when creating BufferAllocators. Manual ref counting will be
> > only
> > > discard/ignored in "MemoryChunkCleaner", while by default
> > > "MemoryChunkManager" is used which is totally the same as current.
> > >
> > >
> > >
> > >
> > > For example, to enable GC-based functionalities, the BufferAllocator
> has
> > > to be created using the following way:
> > >
> > >
> > >
> > >
> > > BufferAllocator root = new RootAllocator(
> > >
> > >         BaseAllocator.configBuilder()
> > >
> > >             .maxAllocation(MAX_ALLOCATION)
> > >
> > >             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory())
> > //
> > > to enable GC-based buffer release
> > >
> > >             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM
> > GC
> > > when allocator is full
> > >
> > >             .build());
> > >
> > > Thanks,
> > >
> > > Hongze
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
> > > >Unless I missed something in the big picture, but it seems that using
> a
> > GC
> > > >based approach would make things much inefficient in terms of memory
> > > >management and would cause probably quite the confusion for all the
> > > systems
> > > >out there which rely on refcounting for keeping things in check, I'm
> not
> > > >sure why changing the default is such a good idea...
> > > >
> > > >On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com>
> wrote:
> > > >
> > > >> Hi Laurent,
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Sorry I might describe it unclearly and yes, the purpose is
> > > >> straightforward: to discard (in another strategy rather than
> default)
> > > the
> > > >> usage of Java buffer ref counting and rely on GC to do cleanup for
> the
> > > >> buffers. Though some design changes to Allocator APIs maybe required
> > to
> > > >> achieve this, for example, the decoupling of MemoryChunkAllocator
> from
> > > >> AllocationManager.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Should we make more discussion on it, or I can open a ticket asap?
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Hongze
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com>
> wrote:
> > > >> >Hi,
> > > >> >
> > > >> >I gave a quick look at your patches but to be honest, it's not easy
> > to
> > > >> >figure out the problem you're trying to solve in the first place.
> Or
> > > is it
> > > >> >simply to discard the use of ref counting to track buffer usages
> and
> > > >> >relying on Java references to keep track of buffers at the expense
> of
> > > >> >creating more pressure on the GC itself to collect and free
> buffers?
> > > >> >
> > > >> >On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com>
> > > wrote:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> I would like to discuss on the potential of introducing a
> GC-based
> > > >> >> reference management strategy to Arrow Java, and we
> > > >> >> have already been working on an implementation in our own
> project.
> > I
> > > >> have
> > > >> >> put the related codes in following branch and
> > > >> >> if it makes sense to upstream Apache Arrow I can open a PR for
> it:
> > > >> >>
> > > >> >>
> > > >> >>
> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > > >> >>
> > > >> >> In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > > >> >>
> > > >> >> Commit 1: To break AllocationManager to two components:
> > > >> >> MemoryChunkManager[1] which maintains BufferLedger-
> > > >> >> BufferAllocator mappings, and MemoryChunkAllocator[2] which
> > performs
> > > the
> > > >> >> underlying allocate/destory operations. The
> > > >> >> previous customizations such as NettyAllocationManager,
> > > >> >> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> > > >> >> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> > > >> >>
> > > >> >> Commit 2: To introduce a new implementation of
> MemoryChunkManager,
> > > >> >> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > > >> >> converts all managed chunks to WeakReferences, and a global
> thread
> > > will
> > > >> >> observe on the assigned reference queue to
> > > >> >> release the unused garbage chunks which are enqueued by JVM GC.
> > Some
> > > >> >> modes[4] are there to provide different strategies,
> > > >> >> such as to disable manual reference management (default), or to
> > > enable
> > > >> >> manual and GC-based reference management at the
> > > >> >> same time, or to report leaks only.
> > > >> >>
> > > >> >> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > > >> BufferAllocator.
> > > >> >> This makes some special workloads, e.g.
> > > >> >> cross-JNI buffer sharing much easier as we will no longer need an
> > > >> >> Allocator directly binding to the shared memory
> > > >> >> chunks, and will still be able to leverage all of Allocator's
> > > advantages
> > > >> >> to manage the chunks (E.g. to use
> > > >> >> MemoryChunkCleaner).
> > > >> >>
> > > >> >> Some possible Q&As about this implementation:
> > > >> >>
> > > >> >> 1. Why adding MemoryChunkAllocator, rather than just customizing
> > > >> >> AllocationManager?
> > > >> >> It is to reuse all of current underlying chunk allocation
> > strategies,
> > > >> >> Netty, Unsafe, and others. Also, with the layer of
> > > >> >> MemoryChunk separated from AllocationManager we will be able to
> > > create
> > > >> >> MemoryChunks actively in some special cases, e.g.
> > > >> >> cross-JNI buffer sharing, C data interface buffer importing, then
> > > >> populate
> > > >> >> the chunks to a manager BufferAllocator.
> > > >> >>
> > > >> >> 2. Why using WeakReference rather than PhantomReference?
> > > >> >> In Java 8, PhantomReference has some issue against its referent
> > > object.
> > > >> >> The object cannot be garbage collected before
> > > >> >> being enqueued. In WeakReference we don't have this issue. See
> > > ref[5].
> > > >> >>
> > > >> >> 3. Why not sun.misc.Cleaner?
> > > >> >> Cleaner API maintains a global doubly linked list to keep the
> > Cleaner
> > > >> >> instances alive. This brings overheads to us since
> > > >> >> we will create local doubly linked list for cleaning up all the
> > > buffers
> > > >> on
> > > >> >> Allocator close. See a unit test[6].
> > > >> >>
> > > >> >> 4. Why closing all buffers on Allocator close?
> > > >> >> This behavior can be customizabale within Mode[7]s. A principle
> is,
> > > when
> > > >> >> relying on GC, we should allow closing buffers
> > > >> >> manually, or at least closing all of them on Allocator close. Or
> > the
> > > >> >> actual release time of the underlying chunks will
> > > >> >> be unpredictable.
> > > >> >>
> > > >> >> 5. Can we use heap based buffers?
> > > >> >> If I am not wrong, no. The heap objects can be physically moved
> > > around
> > > >> by
> > > >> >> JVM. The addresses can vary.
> > > >> >>
> > > >> >> 6. When should GC happen?
> > > >> >> A new AllocationListener implementation, GCTriger[8] is
> introduced.
> > > GC
> > > >> >> will be performed when BufferAllocator is full.
> > > >> >>
> > > >> >> 7. Performance?
> > > >> >> Based on previous measurement, it doesn't bring overheads on the
> > > legacy
> > > >> >> path (by using default MemoryChunkManager). For
> > > >> >> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations
> can
> > be
> > > >> >> comparatively slower due to higher GC pressure
> > > >> >> (For example, in out Arrow-based query engine, to run TPC-DS
> > SF1000,
> > > the
> > > >> >> overhead can be up to 3%). I can collect more
> > > >> >> benchmark results in future, maybe under a JIRA ticket or a PR.
> > > >> >>
> > > >> >> 8. Breaking changes?
> > > >> >> It doesn't break library-based allocation strategy selection like
> > > >> directly
> > > >> >> including arrow-memory-unsafe.jar to
> > > >> >> projects. However it breaks use of the previous
> > > >> AllocationManager.Factory
> > > >> >> API. Users should migrate to
> > > >> >> MemoryChunkAllocator API. Which in my opinion is simple to do.
> > > >> >>
> > > >> >> The implementation is still evolving, so if the links get
> > out-of-date
> > > >> you
> > > >> >> can check on the newest branch head. Any
> > > >> >> suggestions welcome.
> > > >> >>
> > > >> >> Some other discussions that may be related to this topic:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > > >> >>
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > > >> >>
> > > >> >> Best,
> > > >> >> Hongze
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> [1]
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > > >> >> MemoryChunkManager.java
> > > >> >> [2]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > > >> >>
> > > >> >> [3]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > > >> >> [4]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > >> >> [5]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > > >> >> [6]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > > >> >> [7]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > >> >> [8]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > > >> >>
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
>

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Jacques Nadeau <ja...@apache.org>.
Clearly this patch was driven by an implicit set of needs but it's hard to
guess at what they are. As Laurent asked, what is the main goal here? There
may be many ways to solve this goal.

Some thoughts in general:
- The allocator is a finely tuned piece of multithreaded machinery that is
used on hundreds of thousands of vm's ever year. As such, refactors should
be avoided without a clear pressing need. When a need exists, it should be
discussed extensively and thoroughly designed/vetted before proposing any
patches.
- Having used the library extensively, I can state that the library is
already too heap impacting given the nature of the objects. Increasing GC
load/reliance is actually in direct conflict to many of the past
optimizations of the allocator and vector objects.
- As far as I can tell, a new allocation manager could do auto-releases.
Why go beyond introducing a new allocation manager (or even a backstopping
allocation manager facade). The allocation manager was actually built to
solve for things like memory moving between C and java.
- It isn't clear here how much this is proposed as a backstop mechanism
versus a primary release mechanism. I'm more in support of an optional
backstop mechanism so that production apps don't leak (with appropriate
logged warnings, etc).
- Relying on GC to release direct memory frequently leads to early OOMs
(exactly why the release mechanism was in place and is chosen for high perf
apps like Netty and Arrow). It is fairly easy to allocate a bunch of direct
memory but the GC never kicks in because heap memory is not under load.

Lastly, I agree with Micah on the library use confusion. If we have close()
methods everywhere and they javadoc they are required to release memory but
then we implement something that makes that possibly not true (or not true
in some contexts), wowser on the confusion.




On Thu, Oct 7, 2021 at 3:55 PM Micah Kornfield <em...@gmail.com>
wrote:

> In addition to the points raised by Laurent, I'm concerned about having two
> possible idioms in the same library.  It means code written against the
> library becomes less portable (you need to know how the memory allocator is
> using GC or not).
>
> I understand manual memory management in Java is tedious but is there a
> specific problem this is addressing other than making Arrow have more
> expected semantics to Java users?  Are there maybe alternative solutions to
> addressing the problem?
>
>
> -Micah
>
>
>
> On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com> wrote:
>
> > We don't have to concern about that since no difference will be made on
> > current manual release path unless "MemoryChunkCleaner" is explicitly
> > specified when creating BufferAllocators. Manual ref counting will be
> only
> > discard/ignored in "MemoryChunkCleaner", while by default
> > "MemoryChunkManager" is used which is totally the same as current.
> >
> >
> >
> >
> > For example, to enable GC-based functionalities, the BufferAllocator has
> > to be created using the following way:
> >
> >
> >
> >
> > BufferAllocator root = new RootAllocator(
> >
> >         BaseAllocator.configBuilder()
> >
> >             .maxAllocation(MAX_ALLOCATION)
> >
> >             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory())
> //
> > to enable GC-based buffer release
> >
> >             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM
> GC
> > when allocator is full
> >
> >             .build());
> >
> > Thanks,
> >
> > Hongze
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
> > >Unless I missed something in the big picture, but it seems that using a
> GC
> > >based approach would make things much inefficient in terms of memory
> > >management and would cause probably quite the confusion for all the
> > systems
> > >out there which rely on refcounting for keeping things in check, I'm not
> > >sure why changing the default is such a good idea...
> > >
> > >On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com> wrote:
> > >
> > >> Hi Laurent,
> > >>
> > >>
> > >>
> > >>
> > >> Sorry I might describe it unclearly and yes, the purpose is
> > >> straightforward: to discard (in another strategy rather than default)
> > the
> > >> usage of Java buffer ref counting and rely on GC to do cleanup for the
> > >> buffers. Though some design changes to Allocator APIs maybe required
> to
> > >> achieve this, for example, the decoupling of MemoryChunkAllocator from
> > >> AllocationManager.
> > >>
> > >>
> > >>
> > >>
> > >> Should we make more discussion on it, or I can open a ticket asap?
> > >>
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Hongze
> > >>
> > >>
> > >>
> > >>
> > >> At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
> > >> >Hi,
> > >> >
> > >> >I gave a quick look at your patches but to be honest, it's not easy
> to
> > >> >figure out the problem you're trying to solve in the first place. Or
> > is it
> > >> >simply to discard the use of ref counting to track buffer usages and
> > >> >relying on Java references to keep track of buffers at the expense of
> > >> >creating more pressure on the GC itself to collect and free buffers?
> > >> >
> > >> >On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com>
> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I would like to discuss on the potential of introducing a GC-based
> > >> >> reference management strategy to Arrow Java, and we
> > >> >> have already been working on an implementation in our own project.
> I
> > >> have
> > >> >> put the related codes in following branch and
> > >> >> if it makes sense to upstream Apache Arrow I can open a PR for it:
> > >> >>
> > >> >>
> > >> >> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > >> >>
> > >> >> In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > >> >>
> > >> >> Commit 1: To break AllocationManager to two components:
> > >> >> MemoryChunkManager[1] which maintains BufferLedger-
> > >> >> BufferAllocator mappings, and MemoryChunkAllocator[2] which
> performs
> > the
> > >> >> underlying allocate/destory operations. The
> > >> >> previous customizations such as NettyAllocationManager,
> > >> >> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> > >> >> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> > >> >>
> > >> >> Commit 2: To introduce a new implementation of MemoryChunkManager,
> > >> >> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > >> >> converts all managed chunks to WeakReferences, and a global thread
> > will
> > >> >> observe on the assigned reference queue to
> > >> >> release the unused garbage chunks which are enqueued by JVM GC.
> Some
> > >> >> modes[4] are there to provide different strategies,
> > >> >> such as to disable manual reference management (default), or to
> > enable
> > >> >> manual and GC-based reference management at the
> > >> >> same time, or to report leaks only.
> > >> >>
> > >> >> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > >> BufferAllocator.
> > >> >> This makes some special workloads, e.g.
> > >> >> cross-JNI buffer sharing much easier as we will no longer need an
> > >> >> Allocator directly binding to the shared memory
> > >> >> chunks, and will still be able to leverage all of Allocator's
> > advantages
> > >> >> to manage the chunks (E.g. to use
> > >> >> MemoryChunkCleaner).
> > >> >>
> > >> >> Some possible Q&As about this implementation:
> > >> >>
> > >> >> 1. Why adding MemoryChunkAllocator, rather than just customizing
> > >> >> AllocationManager?
> > >> >> It is to reuse all of current underlying chunk allocation
> strategies,
> > >> >> Netty, Unsafe, and others. Also, with the layer of
> > >> >> MemoryChunk separated from AllocationManager we will be able to
> > create
> > >> >> MemoryChunks actively in some special cases, e.g.
> > >> >> cross-JNI buffer sharing, C data interface buffer importing, then
> > >> populate
> > >> >> the chunks to a manager BufferAllocator.
> > >> >>
> > >> >> 2. Why using WeakReference rather than PhantomReference?
> > >> >> In Java 8, PhantomReference has some issue against its referent
> > object.
> > >> >> The object cannot be garbage collected before
> > >> >> being enqueued. In WeakReference we don't have this issue. See
> > ref[5].
> > >> >>
> > >> >> 3. Why not sun.misc.Cleaner?
> > >> >> Cleaner API maintains a global doubly linked list to keep the
> Cleaner
> > >> >> instances alive. This brings overheads to us since
> > >> >> we will create local doubly linked list for cleaning up all the
> > buffers
> > >> on
> > >> >> Allocator close. See a unit test[6].
> > >> >>
> > >> >> 4. Why closing all buffers on Allocator close?
> > >> >> This behavior can be customizabale within Mode[7]s. A principle is,
> > when
> > >> >> relying on GC, we should allow closing buffers
> > >> >> manually, or at least closing all of them on Allocator close. Or
> the
> > >> >> actual release time of the underlying chunks will
> > >> >> be unpredictable.
> > >> >>
> > >> >> 5. Can we use heap based buffers?
> > >> >> If I am not wrong, no. The heap objects can be physically moved
> > around
> > >> by
> > >> >> JVM. The addresses can vary.
> > >> >>
> > >> >> 6. When should GC happen?
> > >> >> A new AllocationListener implementation, GCTriger[8] is introduced.
> > GC
> > >> >> will be performed when BufferAllocator is full.
> > >> >>
> > >> >> 7. Performance?
> > >> >> Based on previous measurement, it doesn't bring overheads on the
> > legacy
> > >> >> path (by using default MemoryChunkManager). For
> > >> >> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can
> be
> > >> >> comparatively slower due to higher GC pressure
> > >> >> (For example, in out Arrow-based query engine, to run TPC-DS
> SF1000,
> > the
> > >> >> overhead can be up to 3%). I can collect more
> > >> >> benchmark results in future, maybe under a JIRA ticket or a PR.
> > >> >>
> > >> >> 8. Breaking changes?
> > >> >> It doesn't break library-based allocation strategy selection like
> > >> directly
> > >> >> including arrow-memory-unsafe.jar to
> > >> >> projects. However it breaks use of the previous
> > >> AllocationManager.Factory
> > >> >> API. Users should migrate to
> > >> >> MemoryChunkAllocator API. Which in my opinion is simple to do.
> > >> >>
> > >> >> The implementation is still evolving, so if the links get
> out-of-date
> > >> you
> > >> >> can check on the newest branch head. Any
> > >> >> suggestions welcome.
> > >> >>
> > >> >> Some other discussions that may be related to this topic:
> > >> >>
> > >> >>
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > >> >>
> > >> >>
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > >> >>
> > >> >> Best,
> > >> >> Hongze
> > >> >>
> > >> >>
> > >> >>
> > >> >> [1]
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > >> >> MemoryChunkManager.java
> > >> >> [2]
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > >> >>
> > >> >> [3]
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > >> >> [4]
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > >> >> [5]
> > >> >>
> > >> >>
> > >>
> >
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > >> >> [6]
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > >> >> [7]
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > >> >> [8]
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > >> >>
> > >> >>
> > >> >>
> > >>
> >
>

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Hongze Zhang <no...@126.com>.
Hi Micah,

Please see my another reply. Do you think making the whole approach as "leak detector/handler" will
help reducing the complexity of these semantics?

We can also remove the mode that disables manual-release. So when the delector is enabled, that only
unclosed buffers will be handled by GC/allocator-close, probably with warning logs. We should
provide the possibility to let the leaked buffers automatically get cleaned because, although users
know there are leakages via logs, they might not want the service in production ends up OOM before
they figure out the leakage source.

Thanks,
Hongze


On Thu, 2021-10-07 at 15:55 -0700, Micah Kornfield wrote:
> In addition to the points raised by Laurent, I'm concerned about having two
> possible idioms in the same library.  It means code written against the
> library becomes less portable (you need to know how the memory allocator is
> using GC or not).
> 
> I understand manual memory management in Java is tedious but is there a
> specific problem this is addressing other than making Arrow have more
> expected semantics to Java users?  Are there maybe alternative solutions to
> addressing the problem?
> 
> 
> -Micah
> 
> 
> 
> On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com> wrote:
> 
> > We don't have to concern about that since no difference will be made on
> > current manual release path unless "MemoryChunkCleaner" is explicitly
> > specified when creating BufferAllocators. Manual ref counting will be only
> > discard/ignored in "MemoryChunkCleaner", while by default
> > "MemoryChunkManager" is used which is totally the same as current.
> > 
> > 
> > 
> > 
> > For example, to enable GC-based functionalities, the BufferAllocator has
> > to be created using the following way:
> > 
> > 
> > 
> > 
> > BufferAllocator root = new RootAllocator(
> > 
> >         BaseAllocator.configBuilder()
> > 
> >             .maxAllocation(MAX_ALLOCATION)
> > 
> >             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory()) //
> > to enable GC-based buffer release
> > 
> >             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM GC
> > when allocator is full
> > 
> >             .build());
> > 
> > Thanks,
> > 
> > Hongze
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
> > > Unless I missed something in the big picture, but it seems that using a GC
> > > based approach would make things much inefficient in terms of memory
> > > management and would cause probably quite the confusion for all the
> > systems
> > > out there which rely on refcounting for keeping things in check, I'm not
> > > sure why changing the default is such a good idea...
> > > 
> > > On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com> wrote:
> > > 
> > > > Hi Laurent,
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Sorry I might describe it unclearly and yes, the purpose is
> > > > straightforward: to discard (in another strategy rather than default)
> > the
> > > > usage of Java buffer ref counting and rely on GC to do cleanup for the
> > > > buffers. Though some design changes to Allocator APIs maybe required to
> > > > achieve this, for example, the decoupling of MemoryChunkAllocator from
> > > > AllocationManager.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Should we make more discussion on it, or I can open a ticket asap?
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > Hongze
> > > > 
> > > > 
> > > > 
> > > > 
> > > > At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
> > > > > Hi,
> > > > > 
> > > > > I gave a quick look at your patches but to be honest, it's not easy to
> > > > > figure out the problem you're trying to solve in the first place. Or
> > is it
> > > > > simply to discard the use of ref counting to track buffer usages and
> > > > > relying on Java references to keep track of buffers at the expense of
> > > > > creating more pressure on the GC itself to collect and free buffers?
> > > > > 
> > > > > On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com>
> > wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I would like to discuss on the potential of introducing a GC-based
> > > > > > reference management strategy to Arrow Java, and we
> > > > > > have already been working on an implementation in our own project. I
> > > > have
> > > > > > put the related codes in following branch and
> > > > > > if it makes sense to upstream Apache Arrow I can open a PR for it:
> > > > > > 
> > > > > > 
> > > > > > https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > > > > > 
> > > > > > In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > > > > > 
> > > > > > Commit 1: To break AllocationManager to two components:
> > > > > > MemoryChunkManager[1] which maintains BufferLedger-
> > > > > > BufferAllocator mappings, and MemoryChunkAllocator[2] which performs
> > the
> > > > > > underlying allocate/destory operations. The
> > > > > > previous customizations such as NettyAllocationManager,
> > > > > > UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> > > > > > as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> > > > > > 
> > > > > > Commit 2: To introduce a new implementation of MemoryChunkManager,
> > > > > > MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > > > > > converts all managed chunks to WeakReferences, and a global thread
> > will
> > > > > > observe on the assigned reference queue to
> > > > > > release the unused garbage chunks which are enqueued by JVM GC. Some
> > > > > > modes[4] are there to provide different strategies,
> > > > > > such as to disable manual reference management (default), or to
> > enable
> > > > > > manual and GC-based reference management at the
> > > > > > same time, or to report leaks only.
> > > > > > 
> > > > > > Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > > > BufferAllocator.
> > > > > > This makes some special workloads, e.g.
> > > > > > cross-JNI buffer sharing much easier as we will no longer need an
> > > > > > Allocator directly binding to the shared memory
> > > > > > chunks, and will still be able to leverage all of Allocator's
> > advantages
> > > > > > to manage the chunks (E.g. to use
> > > > > > MemoryChunkCleaner).
> > > > > > 
> > > > > > Some possible Q&As about this implementation:
> > > > > > 
> > > > > > 1. Why adding MemoryChunkAllocator, rather than just customizing
> > > > > > AllocationManager?
> > > > > > It is to reuse all of current underlying chunk allocation strategies,
> > > > > > Netty, Unsafe, and others. Also, with the layer of
> > > > > > MemoryChunk separated from AllocationManager we will be able to
> > create
> > > > > > MemoryChunks actively in some special cases, e.g.
> > > > > > cross-JNI buffer sharing, C data interface buffer importing, then
> > > > populate
> > > > > > the chunks to a manager BufferAllocator.
> > > > > > 
> > > > > > 2. Why using WeakReference rather than PhantomReference?
> > > > > > In Java 8, PhantomReference has some issue against its referent
> > object.
> > > > > > The object cannot be garbage collected before
> > > > > > being enqueued. In WeakReference we don't have this issue. See
> > ref[5].
> > > > > > 3. Why not sun.misc.Cleaner?
> > > > > > Cleaner API maintains a global doubly linked list to keep the Cleaner
> > > > > > instances alive. This brings overheads to us since
> > > > > > we will create local doubly linked list for cleaning up all the
> > buffers
> > > > on
> > > > > > Allocator close. See a unit test[6].
> > > > > > 
> > > > > > 4. Why closing all buffers on Allocator close?
> > > > > > This behavior can be customizabale within Mode[7]s. A principle is,
> > when
> > > > > > relying on GC, we should allow closing buffers
> > > > > > manually, or at least closing all of them on Allocator close. Or the
> > > > > > actual release time of the underlying chunks will
> > > > > > be unpredictable.
> > > > > > 
> > > > > > 5. Can we use heap based buffers?
> > > > > > If I am not wrong, no. The heap objects can be physically moved
> > around
> > > > by
> > > > > > JVM. The addresses can vary.
> > > > > > 
> > > > > > 6. When should GC happen?
> > > > > > A new AllocationListener implementation, GCTriger[8] is introduced.
> > GC
> > > > > > will be performed when BufferAllocator is full.
> > > > > > 
> > > > > > 7. Performance?
> > > > > > Based on previous measurement, it doesn't bring overheads on the
> > legacy
> > > > > > path (by using default MemoryChunkManager). For
> > > > > > GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be
> > > > > > comparatively slower due to higher GC pressure
> > > > > > (For example, in out Arrow-based query engine, to run TPC-DS SF1000,
> > the
> > > > > > overhead can be up to 3%). I can collect more
> > > > > > benchmark results in future, maybe under a JIRA ticket or a PR.
> > > > > > 
> > > > > > 8. Breaking changes?
> > > > > > It doesn't break library-based allocation strategy selection like
> > > > directly
> > > > > > including arrow-memory-unsafe.jar to
> > > > > > projects. However it breaks use of the previous
> > > > AllocationManager.Factory
> > > > > > API. Users should migrate to
> > > > > > MemoryChunkAllocator API. Which in my opinion is simple to do.
> > > > > > 
> > > > > > The implementation is still evolving, so if the links get out-of-date
> > > > you
> > > > > > can check on the newest branch head. Any
> > > > > > suggestions welcome.
> > > > > > 
> > > > > > Some other discussions that may be related to this topic:
> > > > > > 
> > > > > > 
> > > > > > 
> > https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > > > > > 
> > > > > > 
> > https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > > > > > Best,
> > > > > > Hongze
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > [1]
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > > > > > MemoryChunkManager.java
> > > > > > [2]
> > > > > > 
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > > > > > [3]
> > > > > > 
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > > > > > [4]
> > > > > > 
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > [5]
> > > > > > 
> > > > > > 
> > https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > > > > > [6]
> > > > > > 
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > > > > > [7]
> > > > > > 
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > > > > [8]
> > > > > > 
> > > > > > 
> > https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > > > > > 
> > > > > > 


Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Micah Kornfield <em...@gmail.com>.
In addition to the points raised by Laurent, I'm concerned about having two
possible idioms in the same library.  It means code written against the
library becomes less portable (you need to know how the memory allocator is
using GC or not).

I understand manual memory management in Java is tedious but is there a
specific problem this is addressing other than making Arrow have more
expected semantics to Java users?  Are there maybe alternative solutions to
addressing the problem?


-Micah



On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <no...@126.com> wrote:

> We don't have to concern about that since no difference will be made on
> current manual release path unless "MemoryChunkCleaner" is explicitly
> specified when creating BufferAllocators. Manual ref counting will be only
> discard/ignored in "MemoryChunkCleaner", while by default
> "MemoryChunkManager" is used which is totally the same as current.
>
>
>
>
> For example, to enable GC-based functionalities, the BufferAllocator has
> to be created using the following way:
>
>
>
>
> BufferAllocator root = new RootAllocator(
>
>         BaseAllocator.configBuilder()
>
>             .maxAllocation(MAX_ALLOCATION)
>
>             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory()) //
> to enable GC-based buffer release
>
>             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM GC
> when allocator is full
>
>             .build());
>
> Thanks,
>
> Hongze
>
>
>
>
>
>
>
>
> At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
> >Unless I missed something in the big picture, but it seems that using a GC
> >based approach would make things much inefficient in terms of memory
> >management and would cause probably quite the confusion for all the
> systems
> >out there which rely on refcounting for keeping things in check, I'm not
> >sure why changing the default is such a good idea...
> >
> >On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com> wrote:
> >
> >> Hi Laurent,
> >>
> >>
> >>
> >>
> >> Sorry I might describe it unclearly and yes, the purpose is
> >> straightforward: to discard (in another strategy rather than default)
> the
> >> usage of Java buffer ref counting and rely on GC to do cleanup for the
> >> buffers. Though some design changes to Allocator APIs maybe required to
> >> achieve this, for example, the decoupling of MemoryChunkAllocator from
> >> AllocationManager.
> >>
> >>
> >>
> >>
> >> Should we make more discussion on it, or I can open a ticket asap?
> >>
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Hongze
> >>
> >>
> >>
> >>
> >> At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
> >> >Hi,
> >> >
> >> >I gave a quick look at your patches but to be honest, it's not easy to
> >> >figure out the problem you're trying to solve in the first place. Or
> is it
> >> >simply to discard the use of ref counting to track buffer usages and
> >> >relying on Java references to keep track of buffers at the expense of
> >> >creating more pressure on the GC itself to collect and free buffers?
> >> >
> >> >On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I would like to discuss on the potential of introducing a GC-based
> >> >> reference management strategy to Arrow Java, and we
> >> >> have already been working on an implementation in our own project. I
> >> have
> >> >> put the related codes in following branch and
> >> >> if it makes sense to upstream Apache Arrow I can open a PR for it:
> >> >>
> >> >>
> >> >> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> >> >>
> >> >> In the branch, regarding Commit 1, Commit 2 and Commit 3:
> >> >>
> >> >> Commit 1: To break AllocationManager to two components:
> >> >> MemoryChunkManager[1] which maintains BufferLedger-
> >> >> BufferAllocator mappings, and MemoryChunkAllocator[2] which performs
> the
> >> >> underlying allocate/destory operations. The
> >> >> previous customizations such as NettyAllocationManager,
> >> >> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> >> >> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> >> >>
> >> >> Commit 2: To introduce a new implementation of MemoryChunkManager,
> >> >> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> >> >> converts all managed chunks to WeakReferences, and a global thread
> will
> >> >> observe on the assigned reference queue to
> >> >> release the unused garbage chunks which are enqueued by JVM GC. Some
> >> >> modes[4] are there to provide different strategies,
> >> >> such as to disable manual reference management (default), or to
> enable
> >> >> manual and GC-based reference management at the
> >> >> same time, or to report leaks only.
> >> >>
> >> >> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> >> BufferAllocator.
> >> >> This makes some special workloads, e.g.
> >> >> cross-JNI buffer sharing much easier as we will no longer need an
> >> >> Allocator directly binding to the shared memory
> >> >> chunks, and will still be able to leverage all of Allocator's
> advantages
> >> >> to manage the chunks (E.g. to use
> >> >> MemoryChunkCleaner).
> >> >>
> >> >> Some possible Q&As about this implementation:
> >> >>
> >> >> 1. Why adding MemoryChunkAllocator, rather than just customizing
> >> >> AllocationManager?
> >> >> It is to reuse all of current underlying chunk allocation strategies,
> >> >> Netty, Unsafe, and others. Also, with the layer of
> >> >> MemoryChunk separated from AllocationManager we will be able to
> create
> >> >> MemoryChunks actively in some special cases, e.g.
> >> >> cross-JNI buffer sharing, C data interface buffer importing, then
> >> populate
> >> >> the chunks to a manager BufferAllocator.
> >> >>
> >> >> 2. Why using WeakReference rather than PhantomReference?
> >> >> In Java 8, PhantomReference has some issue against its referent
> object.
> >> >> The object cannot be garbage collected before
> >> >> being enqueued. In WeakReference we don't have this issue. See
> ref[5].
> >> >>
> >> >> 3. Why not sun.misc.Cleaner?
> >> >> Cleaner API maintains a global doubly linked list to keep the Cleaner
> >> >> instances alive. This brings overheads to us since
> >> >> we will create local doubly linked list for cleaning up all the
> buffers
> >> on
> >> >> Allocator close. See a unit test[6].
> >> >>
> >> >> 4. Why closing all buffers on Allocator close?
> >> >> This behavior can be customizabale within Mode[7]s. A principle is,
> when
> >> >> relying on GC, we should allow closing buffers
> >> >> manually, or at least closing all of them on Allocator close. Or the
> >> >> actual release time of the underlying chunks will
> >> >> be unpredictable.
> >> >>
> >> >> 5. Can we use heap based buffers?
> >> >> If I am not wrong, no. The heap objects can be physically moved
> around
> >> by
> >> >> JVM. The addresses can vary.
> >> >>
> >> >> 6. When should GC happen?
> >> >> A new AllocationListener implementation, GCTriger[8] is introduced.
> GC
> >> >> will be performed when BufferAllocator is full.
> >> >>
> >> >> 7. Performance?
> >> >> Based on previous measurement, it doesn't bring overheads on the
> legacy
> >> >> path (by using default MemoryChunkManager). For
> >> >> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be
> >> >> comparatively slower due to higher GC pressure
> >> >> (For example, in out Arrow-based query engine, to run TPC-DS SF1000,
> the
> >> >> overhead can be up to 3%). I can collect more
> >> >> benchmark results in future, maybe under a JIRA ticket or a PR.
> >> >>
> >> >> 8. Breaking changes?
> >> >> It doesn't break library-based allocation strategy selection like
> >> directly
> >> >> including arrow-memory-unsafe.jar to
> >> >> projects. However it breaks use of the previous
> >> AllocationManager.Factory
> >> >> API. Users should migrate to
> >> >> MemoryChunkAllocator API. Which in my opinion is simple to do.
> >> >>
> >> >> The implementation is still evolving, so if the links get out-of-date
> >> you
> >> >> can check on the newest branch head. Any
> >> >> suggestions welcome.
> >> >>
> >> >> Some other discussions that may be related to this topic:
> >> >>
> >> >>
> >> >>
> >>
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> >> >>
> >> >>
> >> >>
> >>
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> >> >>
> >> >> Best,
> >> >> Hongze
> >> >>
> >> >>
> >> >>
> >> >> [1]
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> >> >> MemoryChunkManager.java
> >> >> [2]
> >> >>
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> >> >>
> >> >> [3]
> >> >>
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> >> >> [4]
> >> >>
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> >> >> [5]
> >> >>
> >> >>
> >>
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> >> >> [6]
> >> >>
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> >> >> [7]
> >> >>
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> >> >> [8]
> >> >>
> >> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> >> >>
> >> >>
> >> >>
> >>
>

Re:Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Hongze Zhang <no...@126.com>.
We don't have to concern about that since no difference will be made on current manual release path unless "MemoryChunkCleaner" is explicitly specified when creating BufferAllocators. Manual ref counting will be only discard/ignored in "MemoryChunkCleaner", while by default "MemoryChunkManager" is used which is totally the same as current.




For example, to enable GC-based functionalities, the BufferAllocator has to be created using the following way:




BufferAllocator root = new RootAllocator(

        BaseAllocator.configBuilder()

            .maxAllocation(MAX_ALLOCATION)

            .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory()) // to enable GC-based buffer release

            .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM GC when allocator is full

            .build());

Thanks,

Hongze








At 2021-10-07 12:53:23, "Laurent Goujon" <la...@dremio.com> wrote:
>Unless I missed something in the big picture, but it seems that using a GC
>based approach would make things much inefficient in terms of memory
>management and would cause probably quite the confusion for all the systems
>out there which rely on refcounting for keeping things in check, I'm not
>sure why changing the default is such a good idea...
>
>On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com> wrote:
>
>> Hi Laurent,
>>
>>
>>
>>
>> Sorry I might describe it unclearly and yes, the purpose is
>> straightforward: to discard (in another strategy rather than default) the
>> usage of Java buffer ref counting and rely on GC to do cleanup for the
>> buffers. Though some design changes to Allocator APIs maybe required to
>> achieve this, for example, the decoupling of MemoryChunkAllocator from
>> AllocationManager.
>>
>>
>>
>>
>> Should we make more discussion on it, or I can open a ticket asap?
>>
>>
>>
>>
>> Thanks,
>>
>> Hongze
>>
>>
>>
>>
>> At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
>> >Hi,
>> >
>> >I gave a quick look at your patches but to be honest, it's not easy to
>> >figure out the problem you're trying to solve in the first place. Or is it
>> >simply to discard the use of ref counting to track buffer usages and
>> >relying on Java references to keep track of buffers at the expense of
>> >creating more pressure on the GC itself to collect and free buffers?
>> >
>> >On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> I would like to discuss on the potential of introducing a GC-based
>> >> reference management strategy to Arrow Java, and we
>> >> have already been working on an implementation in our own project. I
>> have
>> >> put the related codes in following branch and
>> >> if it makes sense to upstream Apache Arrow I can open a PR for it:
>> >>
>> >>
>> >> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
>> >>
>> >> In the branch, regarding Commit 1, Commit 2 and Commit 3:
>> >>
>> >> Commit 1: To break AllocationManager to two components:
>> >> MemoryChunkManager[1] which maintains BufferLedger-
>> >> BufferAllocator mappings, and MemoryChunkAllocator[2] which performs the
>> >> underlying allocate/destory operations. The
>> >> previous customizations such as NettyAllocationManager,
>> >> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
>> >> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
>> >>
>> >> Commit 2: To introduce a new implementation of MemoryChunkManager,
>> >> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
>> >> converts all managed chunks to WeakReferences, and a global thread will
>> >> observe on the assigned reference queue to
>> >> release the unused garbage chunks which are enqueued by JVM GC. Some
>> >> modes[4] are there to provide different strategies,
>> >> such as to disable manual reference management (default), or to enable
>> >> manual and GC-based reference management at the
>> >> same time, or to report leaks only.
>> >>
>> >> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
>> BufferAllocator.
>> >> This makes some special workloads, e.g.
>> >> cross-JNI buffer sharing much easier as we will no longer need an
>> >> Allocator directly binding to the shared memory
>> >> chunks, and will still be able to leverage all of Allocator's advantages
>> >> to manage the chunks (E.g. to use
>> >> MemoryChunkCleaner).
>> >>
>> >> Some possible Q&As about this implementation:
>> >>
>> >> 1. Why adding MemoryChunkAllocator, rather than just customizing
>> >> AllocationManager?
>> >> It is to reuse all of current underlying chunk allocation strategies,
>> >> Netty, Unsafe, and others. Also, with the layer of
>> >> MemoryChunk separated from AllocationManager we will be able to create
>> >> MemoryChunks actively in some special cases, e.g.
>> >> cross-JNI buffer sharing, C data interface buffer importing, then
>> populate
>> >> the chunks to a manager BufferAllocator.
>> >>
>> >> 2. Why using WeakReference rather than PhantomReference?
>> >> In Java 8, PhantomReference has some issue against its referent object.
>> >> The object cannot be garbage collected before
>> >> being enqueued. In WeakReference we don't have this issue. See ref[5].
>> >>
>> >> 3. Why not sun.misc.Cleaner?
>> >> Cleaner API maintains a global doubly linked list to keep the Cleaner
>> >> instances alive. This brings overheads to us since
>> >> we will create local doubly linked list for cleaning up all the buffers
>> on
>> >> Allocator close. See a unit test[6].
>> >>
>> >> 4. Why closing all buffers on Allocator close?
>> >> This behavior can be customizabale within Mode[7]s. A principle is, when
>> >> relying on GC, we should allow closing buffers
>> >> manually, or at least closing all of them on Allocator close. Or the
>> >> actual release time of the underlying chunks will
>> >> be unpredictable.
>> >>
>> >> 5. Can we use heap based buffers?
>> >> If I am not wrong, no. The heap objects can be physically moved around
>> by
>> >> JVM. The addresses can vary.
>> >>
>> >> 6. When should GC happen?
>> >> A new AllocationListener implementation, GCTriger[8] is introduced. GC
>> >> will be performed when BufferAllocator is full.
>> >>
>> >> 7. Performance?
>> >> Based on previous measurement, it doesn't bring overheads on the legacy
>> >> path (by using default MemoryChunkManager). For
>> >> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be
>> >> comparatively slower due to higher GC pressure
>> >> (For example, in out Arrow-based query engine, to run TPC-DS SF1000, the
>> >> overhead can be up to 3%). I can collect more
>> >> benchmark results in future, maybe under a JIRA ticket or a PR.
>> >>
>> >> 8. Breaking changes?
>> >> It doesn't break library-based allocation strategy selection like
>> directly
>> >> including arrow-memory-unsafe.jar to
>> >> projects. However it breaks use of the previous
>> AllocationManager.Factory
>> >> API. Users should migrate to
>> >> MemoryChunkAllocator API. Which in my opinion is simple to do.
>> >>
>> >> The implementation is still evolving, so if the links get out-of-date
>> you
>> >> can check on the newest branch head. Any
>> >> suggestions welcome.
>> >>
>> >> Some other discussions that may be related to this topic:
>> >>
>> >>
>> >>
>> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
>> >>
>> >>
>> >>
>> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
>> >>
>> >> Best,
>> >> Hongze
>> >>
>> >>
>> >>
>> >> [1]
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
>> >> MemoryChunkManager.java
>> >> [2]
>> >>
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
>> >>
>> >> [3]
>> >>
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
>> >> [4]
>> >>
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
>> >> [5]
>> >>
>> >>
>> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
>> >> [6]
>> >>
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
>> >> [7]
>> >>
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
>> >> [8]
>> >>
>> >>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
>> >>
>> >>
>> >>
>>

Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Laurent Goujon <la...@dremio.com>.
Unless I missed something in the big picture, but it seems that using a GC
based approach would make things much inefficient in terms of memory
management and would cause probably quite the confusion for all the systems
out there which rely on refcounting for keeping things in check, I'm not
sure why changing the default is such a good idea...

On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <no...@126.com> wrote:

> Hi Laurent,
>
>
>
>
> Sorry I might describe it unclearly and yes, the purpose is
> straightforward: to discard (in another strategy rather than default) the
> usage of Java buffer ref counting and rely on GC to do cleanup for the
> buffers. Though some design changes to Allocator APIs maybe required to
> achieve this, for example, the decoupling of MemoryChunkAllocator from
> AllocationManager.
>
>
>
>
> Should we make more discussion on it, or I can open a ticket asap?
>
>
>
>
> Thanks,
>
> Hongze
>
>
>
>
> At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
> >Hi,
> >
> >I gave a quick look at your patches but to be honest, it's not easy to
> >figure out the problem you're trying to solve in the first place. Or is it
> >simply to discard the use of ref counting to track buffer usages and
> >relying on Java references to keep track of buffers at the expense of
> >creating more pressure on the GC itself to collect and free buffers?
> >
> >On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com> wrote:
> >
> >> Hi,
> >>
> >> I would like to discuss on the potential of introducing a GC-based
> >> reference management strategy to Arrow Java, and we
> >> have already been working on an implementation in our own project. I
> have
> >> put the related codes in following branch and
> >> if it makes sense to upstream Apache Arrow I can open a PR for it:
> >>
> >>
> >> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> >>
> >> In the branch, regarding Commit 1, Commit 2 and Commit 3:
> >>
> >> Commit 1: To break AllocationManager to two components:
> >> MemoryChunkManager[1] which maintains BufferLedger-
> >> BufferAllocator mappings, and MemoryChunkAllocator[2] which performs the
> >> underlying allocate/destory operations. The
> >> previous customizations such as NettyAllocationManager,
> >> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> >> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> >>
> >> Commit 2: To introduce a new implementation of MemoryChunkManager,
> >> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> >> converts all managed chunks to WeakReferences, and a global thread will
> >> observe on the assigned reference queue to
> >> release the unused garbage chunks which are enqueued by JVM GC. Some
> >> modes[4] are there to provide different strategies,
> >> such as to disable manual reference management (default), or to enable
> >> manual and GC-based reference management at the
> >> same time, or to report leaks only.
> >>
> >> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> BufferAllocator.
> >> This makes some special workloads, e.g.
> >> cross-JNI buffer sharing much easier as we will no longer need an
> >> Allocator directly binding to the shared memory
> >> chunks, and will still be able to leverage all of Allocator's advantages
> >> to manage the chunks (E.g. to use
> >> MemoryChunkCleaner).
> >>
> >> Some possible Q&As about this implementation:
> >>
> >> 1. Why adding MemoryChunkAllocator, rather than just customizing
> >> AllocationManager?
> >> It is to reuse all of current underlying chunk allocation strategies,
> >> Netty, Unsafe, and others. Also, with the layer of
> >> MemoryChunk separated from AllocationManager we will be able to create
> >> MemoryChunks actively in some special cases, e.g.
> >> cross-JNI buffer sharing, C data interface buffer importing, then
> populate
> >> the chunks to a manager BufferAllocator.
> >>
> >> 2. Why using WeakReference rather than PhantomReference?
> >> In Java 8, PhantomReference has some issue against its referent object.
> >> The object cannot be garbage collected before
> >> being enqueued. In WeakReference we don't have this issue. See ref[5].
> >>
> >> 3. Why not sun.misc.Cleaner?
> >> Cleaner API maintains a global doubly linked list to keep the Cleaner
> >> instances alive. This brings overheads to us since
> >> we will create local doubly linked list for cleaning up all the buffers
> on
> >> Allocator close. See a unit test[6].
> >>
> >> 4. Why closing all buffers on Allocator close?
> >> This behavior can be customizabale within Mode[7]s. A principle is, when
> >> relying on GC, we should allow closing buffers
> >> manually, or at least closing all of them on Allocator close. Or the
> >> actual release time of the underlying chunks will
> >> be unpredictable.
> >>
> >> 5. Can we use heap based buffers?
> >> If I am not wrong, no. The heap objects can be physically moved around
> by
> >> JVM. The addresses can vary.
> >>
> >> 6. When should GC happen?
> >> A new AllocationListener implementation, GCTriger[8] is introduced. GC
> >> will be performed when BufferAllocator is full.
> >>
> >> 7. Performance?
> >> Based on previous measurement, it doesn't bring overheads on the legacy
> >> path (by using default MemoryChunkManager). For
> >> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be
> >> comparatively slower due to higher GC pressure
> >> (For example, in out Arrow-based query engine, to run TPC-DS SF1000, the
> >> overhead can be up to 3%). I can collect more
> >> benchmark results in future, maybe under a JIRA ticket or a PR.
> >>
> >> 8. Breaking changes?
> >> It doesn't break library-based allocation strategy selection like
> directly
> >> including arrow-memory-unsafe.jar to
> >> projects. However it breaks use of the previous
> AllocationManager.Factory
> >> API. Users should migrate to
> >> MemoryChunkAllocator API. Which in my opinion is simple to do.
> >>
> >> The implementation is still evolving, so if the links get out-of-date
> you
> >> can check on the newest branch head. Any
> >> suggestions welcome.
> >>
> >> Some other discussions that may be related to this topic:
> >>
> >>
> >>
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> >>
> >>
> >>
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> >>
> >> Best,
> >> Hongze
> >>
> >>
> >>
> >> [1]
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> >> MemoryChunkManager.java
> >> [2]
> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> >>
> >> [3]
> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> >> [4]
> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> >> [5]
> >>
> >>
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> >> [6]
> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> >> [7]
> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> >> [8]
> >>
> >>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> >>
> >>
> >>
>

Re:Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Hongze Zhang <no...@126.com>.
Hi Laurent,




Sorry I might describe it unclearly and yes, the purpose is straightforward: to discard (in another strategy rather than default) the usage of Java buffer ref counting and rely on GC to do cleanup for the buffers. Though some design changes to Allocator APIs maybe required to achieve this, for example, the decoupling of MemoryChunkAllocator from AllocationManager.




Should we make more discussion on it, or I can open a ticket asap?




Thanks,

Hongze




At 2021-10-05 13:39:55, "Laurent Goujon" <la...@dremio.com> wrote:
>Hi,
>
>I gave a quick look at your patches but to be honest, it's not easy to
>figure out the problem you're trying to solve in the first place. Or is it
>simply to discard the use of ref counting to track buffer usages and
>relying on Java references to keep track of buffers at the expense of
>creating more pressure on the GC itself to collect and free buffers?
>
>On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com> wrote:
>
>> Hi,
>>
>> I would like to discuss on the potential of introducing a GC-based
>> reference management strategy to Arrow Java, and we
>> have already been working on an implementation in our own project. I have
>> put the related codes in following branch and
>> if it makes sense to upstream Apache Arrow I can open a PR for it:
>>
>>
>> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
>>
>> In the branch, regarding Commit 1, Commit 2 and Commit 3:
>>
>> Commit 1: To break AllocationManager to two components:
>> MemoryChunkManager[1] which maintains BufferLedger-
>> BufferAllocator mappings, and MemoryChunkAllocator[2] which performs the
>> underlying allocate/destory operations. The
>> previous customizations such as NettyAllocationManager,
>> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
>> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
>>
>> Commit 2: To introduce a new implementation of MemoryChunkManager,
>> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
>> converts all managed chunks to WeakReferences, and a global thread will
>> observe on the assigned reference queue to
>> release the unused garbage chunks which are enqueued by JVM GC. Some
>> modes[4] are there to provide different strategies,
>> such as to disable manual reference management (default), or to enable
>> manual and GC-based reference management at the
>> same time, or to report leaks only.
>>
>> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to BufferAllocator.
>> This makes some special workloads, e.g.
>> cross-JNI buffer sharing much easier as we will no longer need an
>> Allocator directly binding to the shared memory
>> chunks, and will still be able to leverage all of Allocator's advantages
>> to manage the chunks (E.g. to use
>> MemoryChunkCleaner).
>>
>> Some possible Q&As about this implementation:
>>
>> 1. Why adding MemoryChunkAllocator, rather than just customizing
>> AllocationManager?
>> It is to reuse all of current underlying chunk allocation strategies,
>> Netty, Unsafe, and others. Also, with the layer of
>> MemoryChunk separated from AllocationManager we will be able to create
>> MemoryChunks actively in some special cases, e.g.
>> cross-JNI buffer sharing, C data interface buffer importing, then populate
>> the chunks to a manager BufferAllocator.
>>
>> 2. Why using WeakReference rather than PhantomReference?
>> In Java 8, PhantomReference has some issue against its referent object.
>> The object cannot be garbage collected before
>> being enqueued. In WeakReference we don't have this issue. See ref[5].
>>
>> 3. Why not sun.misc.Cleaner?
>> Cleaner API maintains a global doubly linked list to keep the Cleaner
>> instances alive. This brings overheads to us since
>> we will create local doubly linked list for cleaning up all the buffers on
>> Allocator close. See a unit test[6].
>>
>> 4. Why closing all buffers on Allocator close?
>> This behavior can be customizabale within Mode[7]s. A principle is, when
>> relying on GC, we should allow closing buffers
>> manually, or at least closing all of them on Allocator close. Or the
>> actual release time of the underlying chunks will
>> be unpredictable.
>>
>> 5. Can we use heap based buffers?
>> If I am not wrong, no. The heap objects can be physically moved around by
>> JVM. The addresses can vary.
>>
>> 6. When should GC happen?
>> A new AllocationListener implementation, GCTriger[8] is introduced. GC
>> will be performed when BufferAllocator is full.
>>
>> 7. Performance?
>> Based on previous measurement, it doesn't bring overheads on the legacy
>> path (by using default MemoryChunkManager). For
>> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be
>> comparatively slower due to higher GC pressure
>> (For example, in out Arrow-based query engine, to run TPC-DS SF1000, the
>> overhead can be up to 3%). I can collect more
>> benchmark results in future, maybe under a JIRA ticket or a PR.
>>
>> 8. Breaking changes?
>> It doesn't break library-based allocation strategy selection like directly
>> including arrow-memory-unsafe.jar to
>> projects. However it breaks use of the previous AllocationManager.Factory
>> API. Users should migrate to
>> MemoryChunkAllocator API. Which in my opinion is simple to do.
>>
>> The implementation is still evolving, so if the links get out-of-date you
>> can check on the newest branch head. Any
>> suggestions welcome.
>>
>> Some other discussions that may be related to this topic:
>>
>>
>> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
>>
>>
>> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
>>
>> Best,
>> Hongze
>>
>>
>>
>> [1]
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
>> MemoryChunkManager.java
>> [2]
>>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
>>
>> [3]
>>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
>> [4]
>>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
>> [5]
>>
>> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
>> [6]
>>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
>> [7]
>>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
>> [8]
>>
>> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
>>
>>
>>

Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Laurent Goujon <la...@dremio.com>.
Hi,

I gave a quick look at your patches but to be honest, it's not easy to
figure out the problem you're trying to solve in the first place. Or is it
simply to discard the use of ref counting to track buffer usages and
relying on Java references to keep track of buffers at the expense of
creating more pressure on the GC itself to collect and free buffers?

On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <no...@126.com> wrote:

> Hi,
>
> I would like to discuss on the potential of introducing a GC-based
> reference management strategy to Arrow Java, and we
> have already been working on an implementation in our own project. I have
> put the related codes in following branch and
> if it makes sense to upstream Apache Arrow I can open a PR for it:
>
>
> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
>
> In the branch, regarding Commit 1, Commit 2 and Commit 3:
>
> Commit 1: To break AllocationManager to two components:
> MemoryChunkManager[1] which maintains BufferLedger-
> BufferAllocator mappings, and MemoryChunkAllocator[2] which performs the
> underlying allocate/destory operations. The
> previous customizations such as NettyAllocationManager,
> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
>
> Commit 2: To introduce a new implementation of MemoryChunkManager,
> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> converts all managed chunks to WeakReferences, and a global thread will
> observe on the assigned reference queue to
> release the unused garbage chunks which are enqueued by JVM GC. Some
> modes[4] are there to provide different strategies,
> such as to disable manual reference management (default), or to enable
> manual and GC-based reference management at the
> same time, or to report leaks only.
>
> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to BufferAllocator.
> This makes some special workloads, e.g.
> cross-JNI buffer sharing much easier as we will no longer need an
> Allocator directly binding to the shared memory
> chunks, and will still be able to leverage all of Allocator's advantages
> to manage the chunks (E.g. to use
> MemoryChunkCleaner).
>
> Some possible Q&As about this implementation:
>
> 1. Why adding MemoryChunkAllocator, rather than just customizing
> AllocationManager?
> It is to reuse all of current underlying chunk allocation strategies,
> Netty, Unsafe, and others. Also, with the layer of
> MemoryChunk separated from AllocationManager we will be able to create
> MemoryChunks actively in some special cases, e.g.
> cross-JNI buffer sharing, C data interface buffer importing, then populate
> the chunks to a manager BufferAllocator.
>
> 2. Why using WeakReference rather than PhantomReference?
> In Java 8, PhantomReference has some issue against its referent object.
> The object cannot be garbage collected before
> being enqueued. In WeakReference we don't have this issue. See ref[5].
>
> 3. Why not sun.misc.Cleaner?
> Cleaner API maintains a global doubly linked list to keep the Cleaner
> instances alive. This brings overheads to us since
> we will create local doubly linked list for cleaning up all the buffers on
> Allocator close. See a unit test[6].
>
> 4. Why closing all buffers on Allocator close?
> This behavior can be customizabale within Mode[7]s. A principle is, when
> relying on GC, we should allow closing buffers
> manually, or at least closing all of them on Allocator close. Or the
> actual release time of the underlying chunks will
> be unpredictable.
>
> 5. Can we use heap based buffers?
> If I am not wrong, no. The heap objects can be physically moved around by
> JVM. The addresses can vary.
>
> 6. When should GC happen?
> A new AllocationListener implementation, GCTriger[8] is introduced. GC
> will be performed when BufferAllocator is full.
>
> 7. Performance?
> Based on previous measurement, it doesn't bring overheads on the legacy
> path (by using default MemoryChunkManager). For
> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be
> comparatively slower due to higher GC pressure
> (For example, in out Arrow-based query engine, to run TPC-DS SF1000, the
> overhead can be up to 3%). I can collect more
> benchmark results in future, maybe under a JIRA ticket or a PR.
>
> 8. Breaking changes?
> It doesn't break library-based allocation strategy selection like directly
> including arrow-memory-unsafe.jar to
> projects. However it breaks use of the previous AllocationManager.Factory
> API. Users should migrate to
> MemoryChunkAllocator API. Which in my opinion is simple to do.
>
> The implementation is still evolving, so if the links get out-of-date you
> can check on the newest branch head. Any
> suggestions welcome.
>
> Some other discussions that may be related to this topic:
>
>
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
>
>
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
>
> Best,
> Hongze
>
>
>
> [1]
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> MemoryChunkManager.java
> [2]
>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
>
> [3]
>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> [4]
>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> [5]
>
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> [6]
>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> [7]
>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> [8]
>
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
>
>
>

[DISCUSS][Java] Adding GC-Based reference management strategy for buffers

Posted by Hongze Zhang <no...@126.com>.
Hi,

I would like to discuss on the potential of introducing a GC-based reference management strategy to Arrow Java, and we
have already been working on an implementation in our own project. I have put the related codes in following branch and
if it makes sense to upstream Apache Arrow I can open a PR for it:


https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner

In the branch, regarding Commit 1, Commit 2 and Commit 3:

Commit 1: To break AllocationManager to two components: MemoryChunkManager[1] which maintains BufferLedger-
BufferAllocator mappings, and MemoryChunkAllocator[2] which performs the underlying allocate/destory operations. The
previous customizations such as NettyAllocationManager, UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.

Commit 2: To introduce a new implementation of MemoryChunkManager, MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
converts all managed chunks to WeakReferences, and a global thread will observe on the assigned reference queue to
release the unused garbage chunks which are enqueued by JVM GC. Some modes[4] are there to provide different strategies,
such as to disable manual reference management (default), or to enable manual and GC-based reference management at the
same time, or to report leaks only.

Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to BufferAllocator. This makes some special workloads, e.g. 
cross-JNI buffer sharing much easier as we will no longer need an Allocator directly binding to the shared memory
chunks, and will still be able to leverage all of Allocator's advantages to manage the chunks (E.g. to use
MemoryChunkCleaner).

Some possible Q&As about this implementation:

1. Why adding MemoryChunkAllocator, rather than just customizing AllocationManager?
It is to reuse all of current underlying chunk allocation strategies, Netty, Unsafe, and others. Also, with the layer of
MemoryChunk separated from AllocationManager we will be able to create MemoryChunks actively in some special cases, e.g.
cross-JNI buffer sharing, C data interface buffer importing, then populate the chunks to a manager BufferAllocator.

2. Why using WeakReference rather than PhantomReference?
In Java 8, PhantomReference has some issue against its referent object. The object cannot be garbage collected before
being enqueued. In WeakReference we don't have this issue. See ref[5].

3. Why not sun.misc.Cleaner?
Cleaner API maintains a global doubly linked list to keep the Cleaner instances alive. This brings overheads to us since
we will create local doubly linked list for cleaning up all the buffers on Allocator close. See a unit test[6].

4. Why closing all buffers on Allocator close?
This behavior can be customizabale within Mode[7]s. A principle is, when relying on GC, we should allow closing buffers
manually, or at least closing all of them on Allocator close. Or the actual release time of the underlying chunks will
be unpredictable.

5. Can we use heap based buffers?
If I am not wrong, no. The heap objects can be physically moved around by JVM. The addresses can vary.

6. When should GC happen?
A new AllocationListener implementation, GCTriger[8] is introduced. GC will be performed when BufferAllocator is full.

7. Performance?
Based on previous measurement, it doesn't bring overheads on the legacy path (by using default MemoryChunkManager). For
GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations can be comparatively slower due to higher GC pressure
(For example, in out Arrow-based query engine, to run TPC-DS SF1000, the overhead can be up to 3%). I can collect more
benchmark results in future, maybe under a JIRA ticket or a PR.

8. Breaking changes?
It doesn't break library-based allocation strategy selection like directly including arrow-memory-unsafe.jar to
projects. However it breaks use of the previous AllocationManager.Factory API. Users should migrate to
MemoryChunkAllocator API. Which in my opinion is simple to do.

The implementation is still evolving, so if the links get out-of-date you can check on the newest branch head. Any
suggestions welcome.

Some other discussions that may be related to this topic:

https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E

https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E

Best,
Hongze



[1] https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
MemoryChunkManager.java
[2]
https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java

[3]
https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
[4]
https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
[5]
https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
[6]
https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
[7]
https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
[8]
https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402