You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Naveen Swamy <mn...@gmail.com> on 2018/09/11 21:36:36 UTC

Off-Heap Memory Management in MXNet Scala

Hi All,

I am working on managing Off-Heap Memory Management and have written a
proposal here based on my prototype and research I did.

Please review the doc and provide your feedback ?

https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management

I had offline discussion with a few people I work with and added their
feedback to the doc as well.

Thanks, Naveen

Re: Off-Heap Memory Management in MXNet Scala

Posted by Naveen Swamy <mn...@gmail.com>.
Thank you all for your feedback.

@Chris: Yes, One of the Amazon user(Calum Leslie) had contributed the
Dispose Pattern removing the free of native handles in Finalizers and
instead added Log. This was done because calling free in Finalizers was
segfaulting the application at random points and was very hard to reproduce
and debug.
The dispose pattern worked for some cases but made code cumbersome from a
readability aspect, keeping track of all the objects that were
created(imagine slice/reshape instead of writing expressions you are now
creating unnecessary variables and calling dispose on them).
As the 1st graph in the design shows despite carefully calling dispose on
most objects, there was constant memory leak and diagnosing leaks wasn't
straightforward. Note that Finalizers run on a separate thread later than
the object was found unreachable.

@Timur, thanks for the feedback.
1) No, the goal here is to manage Native memory that is created for various
operations. In MXNet-Scala most objects are in C++ Heap and Scala objects
are wrappers around it, the MXNet engine when it runs operations expects
objects to be accessible in C++ Heap.

2) Agree MNIST is not representative, the goal was to understand and show
that the existing code has hard to debug memory leaks(even for MNIST). I
was aiming to test my prototype code and see if my changes make a
difference. Yizhi suggested I run tests against RESNET50 model which I will
do as a part of my implementation. I think this is a standard benchmark
model that is widely used. Also note that most of MXNet-Scala's use-case
that we have seen is for Inference.

3) No, we haven't created a branch for Java-API work, please look at this
design and kindly leave your feedback:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API

4) Calling System.gc() will be configurable(including don't call GC), one
of the feedback that I got from a User is calling System.gc on the user's
behalf is intrusive which i think is also the point you are making.

5) understood and agree, I see the calling GC as only a part of the
solution and configurable option. For using GPUs, training and other memory
intensive application ResourceScope is be a very good option.

Another alternative is to create Bytebuffers in Java and map the C++
pointers to JVM heap by tapping to the native malloc/free that way JVM is
aware of all the memory that is allocated and can free appropriately
whenever the objects becomes unreachable. I have to note that this still
does not solve the problem of accumulating memory until GC has kicked in.
This approach is too very involved and might not be tenable.

@Marco, thanks for your comments.
1) JVM kicks of GC when it feels pressure on JVM Heap not CPU RAM. Objects
on GPU are no special they are still off-heap(JVM Heap) so this would work,
look at the graph that show running GAN example on GPUs in the doc.

2) I am not looking to rewrite the Memory Allocation in MXNet, that will
still be handled by the C++ backend, the goal here is to free(reduce of
shared pointer count) native-memory when JVM objects go out of scope(become
unreachable).


@Carin, yes hopefully this would alleviate the memory management headache
for our users.

Hope that makes sense.

Thanks, Naveen


On Wed, Sep 12, 2018 at 6:06 AM, Carin Meier <ca...@gmail.com> wrote:

> Naveen,
>
> Thanks for putting together the detailed document and kickstarting this
> effort. It will benefit all the MXNet JVM users and will help solve a
> current pain point for them.
>
> - Carin
>
> On Tue, Sep 11, 2018 at 5:37 PM Naveen Swamy <mn...@gmail.com> wrote:
>
> > Hi All,
> >
> > I am working on managing Off-Heap Memory Management and have written a
> > proposal here based on my prototype and research I did.
> >
> > Please review the doc and provide your feedback ?
> >
> > https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
> >
> > I had offline discussion with a few people I work with and added their
> > feedback to the doc as well.
> >
> > Thanks, Naveen
> >
>

Re: Off-Heap Memory Management in MXNet Scala

Posted by Carin Meier <ca...@gmail.com>.
Naveen,

Thanks for putting together the detailed document and kickstarting this
effort. It will benefit all the MXNet JVM users and will help solve a
current pain point for them.

- Carin

On Tue, Sep 11, 2018 at 5:37 PM Naveen Swamy <mn...@gmail.com> wrote:

> Hi All,
>
> I am working on managing Off-Heap Memory Management and have written a
> proposal here based on my prototype and research I did.
>
> Please review the doc and provide your feedback ?
>
> https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
>
> I had offline discussion with a few people I work with and added their
> feedback to the doc as well.
>
> Thanks, Naveen
>

Re: Off-Heap Memory Management in MXNet Scala

Posted by Pedro Larroy <pe...@gmail.com>.
Hi Naveen

Great document. I spent some time understanding your proposal and I check
the linked talk on youtube from Boehm.

The topic is complex and the solution needs to be well thought, and seems
the document reflects that.

1-  One thing that was not clear to me from the document is that, if we use
Phantom references, are the other solutions such as ResourceScope.using and
dispose still available or will there be only one way to manage the memory.

In terms of users, I think it's always better to have one recommended way
to do things even if there's different options.

2- Also, Boehm suggests also calling explicitly to dispose objects in
addition to the call to the GC. In our case this is not needed?

Pedro.


On Tue, Sep 11, 2018 at 11:37 PM Naveen Swamy <mn...@gmail.com> wrote:

> Hi All,
>
> I am working on managing Off-Heap Memory Management and have written a
> proposal here based on my prototype and research I did.
>
> Please review the doc and provide your feedback ?
>
> https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
>
> I had offline discussion with a few people I work with and added their
> feedback to the doc as well.
>
> Thanks, Naveen
>

Re: Off-Heap Memory Management in MXNet Scala

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
Interesting and detailed document!

The JVM garbage collector gets executed depending on the memory pressure
for CPU RAM (or a different custom strategy). It was mentioned that this
document also supports disposing GPU objects. Sorry if I missed it, but how
exactly are we ensuring we don't run out of memory on GPU?

There are quite a lot of cases where we get close to the limit of a
available GPU RAM even with explicit disposes. If we run out of memory, we
get a fatal exception and it's basically game over (as of the current
state). How do handle these cases where we can't rely on paging and the
benefits of virtual memory?

Best regards,
Marco

Timur Shenkao <ts...@timshenkao.su> schrieb am Mi., 12. Sep. 2018, 09:59:

> Thanks for great job!
>
> My questions / proposals.
> 1) Have considered Java collections with low memory footprint like
> Fastutil, Koloboke, etc.? They are much more memory efficient and they have
> "better correspondence" with low level data types.
> 2) MNIST example on the page is "bad" because MNIST is handled pretty fast
> even on laptop, i.e. we won't catch GC & off-heap problems.
> 3) Is it 1.2.0-java branch where Java API things happen?
> 4) System.gc() behaves differently on various JVM platforms, JDK
> implementations, GC types. So, I am sure that we will get users' requests
> to eliminate this approach in the future.
> 5) Frameworks like mxnet aren't used separately. Folks have to integrate
> Spark or Spring with DL libraries. And in this case, they often use CMS or
> even more archaic GCs as for streaming or long living jobs G1GC isn't
> always good.
>
>
>
> On Wednesday, September 12, 2018, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > do you log on finalize() if the object wasn’t properly freed (ie
> > NDArray.finalize())? is that available in Scala?
> >
> > On Tue, Sep 11, 2018 at 6:12 PM Qing Lan <la...@live.com> wrote:
> >
> > > Nice document! Way better than current .dispose() in Scala!
> > >
> > > Thanks,
> > > Qing
> > >
> > > On 9/11/18, 6:04 PM, "Chris Olivier" <cj...@gmail.com> wrote:
> > >
> > >     wow, incredible document!
> > >
> > >     On Tue, Sep 11, 2018 at 2:37 PM Naveen Swamy <mn...@gmail.com>
> > > wrote:
> > >
> > >     > Hi All,
> > >     >
> > >     > I am working on managing Off-Heap Memory Management and have
> > written
> > > a
> > >     > proposal here based on my prototype and research I did.
> > >     >
> > >     > Please review the doc and provide your feedback ?
> > >     >
> > >     >
> > >
> https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
> > >     >
> > >     > I had offline discussion with a few people I work with and added
> > > their
> > >     > feedback to the doc as well.
> > >     >
> > >     > Thanks, Naveen
> > >     >
> > >
> > >
> > >
> >
>

Re: Off-Heap Memory Management in MXNet Scala

Posted by Timur Shenkao <ts...@timshenkao.su>.
Thanks for great job!

My questions / proposals.
1) Have considered Java collections with low memory footprint like
Fastutil, Koloboke, etc.? They are much more memory efficient and they have
"better correspondence" with low level data types.
2) MNIST example on the page is "bad" because MNIST is handled pretty fast
even on laptop, i.e. we won't catch GC & off-heap problems.
3) Is it 1.2.0-java branch where Java API things happen?
4) System.gc() behaves differently on various JVM platforms, JDK
implementations, GC types. So, I am sure that we will get users' requests
to eliminate this approach in the future.
5) Frameworks like mxnet aren't used separately. Folks have to integrate
Spark or Spring with DL libraries. And in this case, they often use CMS or
even more archaic GCs as for streaming or long living jobs G1GC isn't
always good.



On Wednesday, September 12, 2018, Chris Olivier <cj...@gmail.com>
wrote:

> do you log on finalize() if the object wasn’t properly freed (ie
> NDArray.finalize())? is that available in Scala?
>
> On Tue, Sep 11, 2018 at 6:12 PM Qing Lan <la...@live.com> wrote:
>
> > Nice document! Way better than current .dispose() in Scala!
> >
> > Thanks,
> > Qing
> >
> > On 9/11/18, 6:04 PM, "Chris Olivier" <cj...@gmail.com> wrote:
> >
> >     wow, incredible document!
> >
> >     On Tue, Sep 11, 2018 at 2:37 PM Naveen Swamy <mn...@gmail.com>
> > wrote:
> >
> >     > Hi All,
> >     >
> >     > I am working on managing Off-Heap Memory Management and have
> written
> > a
> >     > proposal here based on my prototype and research I did.
> >     >
> >     > Please review the doc and provide your feedback ?
> >     >
> >     >
> > https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
> >     >
> >     > I had offline discussion with a few people I work with and added
> > their
> >     > feedback to the doc as well.
> >     >
> >     > Thanks, Naveen
> >     >
> >
> >
> >
>

Re: Off-Heap Memory Management in MXNet Scala

Posted by Chris Olivier <cj...@gmail.com>.
do you log on finalize() if the object wasn’t properly freed (ie
NDArray.finalize())? is that available in Scala?

On Tue, Sep 11, 2018 at 6:12 PM Qing Lan <la...@live.com> wrote:

> Nice document! Way better than current .dispose() in Scala!
>
> Thanks,
> Qing
>
> On 9/11/18, 6:04 PM, "Chris Olivier" <cj...@gmail.com> wrote:
>
>     wow, incredible document!
>
>     On Tue, Sep 11, 2018 at 2:37 PM Naveen Swamy <mn...@gmail.com>
> wrote:
>
>     > Hi All,
>     >
>     > I am working on managing Off-Heap Memory Management and have written
> a
>     > proposal here based on my prototype and research I did.
>     >
>     > Please review the doc and provide your feedback ?
>     >
>     >
> https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
>     >
>     > I had offline discussion with a few people I work with and added
> their
>     > feedback to the doc as well.
>     >
>     > Thanks, Naveen
>     >
>
>
>

Re: Off-Heap Memory Management in MXNet Scala

Posted by Qing Lan <la...@live.com>.
Nice document! Way better than current .dispose() in Scala!

Thanks,
Qing

On 9/11/18, 6:04 PM, "Chris Olivier" <cj...@gmail.com> wrote:

    wow, incredible document!
    
    On Tue, Sep 11, 2018 at 2:37 PM Naveen Swamy <mn...@gmail.com> wrote:
    
    > Hi All,
    >
    > I am working on managing Off-Heap Memory Management and have written a
    > proposal here based on my prototype and research I did.
    >
    > Please review the doc and provide your feedback ?
    >
    > https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
    >
    > I had offline discussion with a few people I work with and added their
    > feedback to the doc as well.
    >
    > Thanks, Naveen
    >
    


Re: Off-Heap Memory Management in MXNet Scala

Posted by Chris Olivier <cj...@gmail.com>.
wow, incredible document!

On Tue, Sep 11, 2018 at 2:37 PM Naveen Swamy <mn...@gmail.com> wrote:

> Hi All,
>
> I am working on managing Off-Heap Memory Management and have written a
> proposal here based on my prototype and research I did.
>
> Please review the doc and provide your feedback ?
>
> https://cwiki.apache.org/confluence/display/MXNET/JVM+Memory+Management
>
> I had offline discussion with a few people I work with and added their
> feedback to the doc as well.
>
> Thanks, Naveen
>