You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Timothée Maret <ti...@gmail.com> on 2015/06/08 19:35:36 UTC

Memory usage and large commits

Hi,

At Adobe, we use large commits to update our content repository atomically.
Those large commits require a large amount of heap memory or the JVM throws
OOMEs and the commit fails.

In one setup, we are configuring the JVM with a max heap size of 32GB, yet
we still hit OOMEs.
I looked at the heap dump taken at the occurrence of the OOME and running
it through Eclipse Memory Analyser I noticed that

1. HashMap objects are consuming the most heap size (~10GB) ; and
2. 54% of the HashMap instances do contain less than 12 elements ; and
3. ~40% of the HashMap instances do contain 1 element ; and
4. The ModifiedNodeState instances contains ~10GB of HashMap

Since HashMaps consist in the vast majority of the memory consumed, memory
consumption could be diminished by using HashMaps with a higher fill ratio.
Looking at the code in [0], it seems HashMaps are sometimes created with
default capacity.
Specifying the initial capacity for every new HashMap instance in [0] as
either (the capacity required) or 1 (if no better guess) would improve the
HashMap fill ratio and thus decrease the commits memory footprint.

wdyt ?

Regards,

Timothee

[0]
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState#ModifiedNodeState

Re: Memory usage and large commits

Posted by Timothée Maret <ti...@gmail.com>.

Hi,

I have opened OAK-2989 to track improving support for arbitrary large
commits in Oak core.

Regards,

Timothee


2015-06-09 10:22 GMT+01:00 Timothée Maret <ti...@gmail.com>:

> Hi Michael,
>
>
> 2015-06-09 9:49 GMT+01:00 Michael Dürig <md...@apache.org>:
>
>>
>> Changing the hash map fill ratio sounds like a work around to me. It will
>> only push the problem a little further out.
>>
>
> I agree that this is pushing the limit a little further.
> However, I believe it anyway makes sense to enable this improvement as it
> is really cheap to implement and still reduce the memory footprint.
> I have opened OAK-2968 and OAK-2969.
>
>
>>
>> Oak should be able to handle arbitrary large commits. So we should find
>> and fix the root cause for this.
>>
>> With which backend does this occur?
>
>
> Mongo / DocumentStore
>
>
>> At which phase of the commit?
>
>
> AFAIK, while building the commit. How can I make sure ?
>
>
>> Did you collect thread dumps from the time when these hash maps start
>> piling up?
>>
>
> No. Though I have a heap dump taken at the time a OOME occurred on an
> instance (JVM ran with -XX:+HeapDumpOnOutOfMemoryError).
> I can not share that heap dump because it contains customer data and it is
> fairly big (32 GB uncompressed).
>
> However, I have attached a high level view of the dominator tree which
> identifies the main heap users (ModifiedNodeState & UpdateOp).
> Both classes contain HashMap/HashSet fields which size is initialised with
> default size.
>
> This instance seems to have a rather deep tree but not wide.
>
>
>>
>> ModifiedNodeState instances are used to collect transient changes in
>> memory up to a certain point. Afterwards changes should be written ahead to
>> the backend.
>
>
> Ok, that would make sense and allow to indeed handle arbitrarily large
> commits.
> With the current state, there seems to be no other way than buying a heap
> of RAM or looking at the OS to swap memory to disk for ages.
>
> Regards,
>
> Timothee
>
>
>>
>>
>> Michael
>>
>>
>> On 8.6.15 7:35 , Timothée Maret wrote:
>>
>>> Hi,
>>>
>>> At Adobe, we use large commits to update our content repository
>>> atomically.
>>> Those large commits require a large amount of heap memory or the JVM
>>> throws
>>> OOMEs and the commit fails.
>>>
>>> In one setup, we are configuring the JVM with a max heap size of 32GB,
>>> yet
>>> we still hit OOMEs.
>>> I looked at the heap dump taken at the occurrence of the OOME and running
>>> it through Eclipse Memory Analyser I noticed that
>>>
>>> 1. HashMap objects are consuming the most heap size (~10GB) ; and
>>> 2. 54% of the HashMap instances do contain less than 12 elements ; and
>>> 3. ~40% of the HashMap instances do contain 1 element ; and
>>> 4. The ModifiedNodeState instances contains ~10GB of HashMap
>>>
>>> Since HashMaps consist in the vast majority of the memory consumed,
>>> memory
>>> consumption could be diminished by using HashMaps with a higher fill
>>> ratio.
>>> Looking at the code in [0], it seems HashMaps are sometimes created with
>>> default capacity.
>>> Specifying the initial capacity for every new HashMap instance in [0] as
>>> either (the capacity required) or 1 (if no better guess) would improve
>>> the
>>> HashMap fill ratio and thus decrease the commits memory footprint.
>>>
>>> wdyt ?
>>>
>>> Regards,
>>>
>>> Timothee
>>>
>>> [0]
>>>
>>> org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState#ModifiedNodeState
>>>
>>>
>

Re: Memory usage and large commits

Posted by Timothée Maret <ti...@gmail.com>.

Hi Michael,

2015-06-09 9:49 GMT+01:00 Michael Dürig <md...@apache.org>:

>
> Changing the hash map fill ratio sounds like a work around to me. It will
> only push the problem a little further out.
>

I agree that this is pushing the limit a little further.
However, I believe it anyway makes sense to enable this improvement as it
is really cheap to implement and still reduce the memory footprint.
I have opened OAK-2968 and OAK-2969.

>
> Oak should be able to handle arbitrary large commits. So we should find
> and fix the root cause for this.
>
> With which backend does this occur?

Mongo / DocumentStore

> At which phase of the commit?

AFAIK, while building the commit. How can I make sure ?

> Did you collect thread dumps from the time when these hash maps start
> piling up?
>

No. Though I have a heap dump taken at the time a OOME occurred on an
instance (JVM ran with -XX:+HeapDumpOnOutOfMemoryError).
I can not share that heap dump because it contains customer data and it is
fairly big (32 GB uncompressed).

However, I have attached a high level view of the dominator tree which
identifies the main heap users (ModifiedNodeState & UpdateOp).
Both classes contain HashMap/HashSet fields which size is initialised with
default size.

This instance seems to have a rather deep tree but not wide.

>
> ModifiedNodeState instances are used to collect transient changes in
> memory up to a certain point. Afterwards changes should be written ahead to
> the backend.

Ok, that would make sense and allow to indeed handle arbitrarily large
commits.
With the current state, there seems to be no other way than buying a heap
of RAM or looking at the OS to swap memory to disk for ages.

Regards,

Timothee

>
>
> Michael
>
>
> On 8.6.15 7:35 , Timothée Maret wrote:
>
>> Hi,
>>
>> At Adobe, we use large commits to update our content repository
>> atomically.
>> Those large commits require a large amount of heap memory or the JVM
>> throws
>> OOMEs and the commit fails.
>>
>> In one setup, we are configuring the JVM with a max heap size of 32GB, yet
>> we still hit OOMEs.
>> I looked at the heap dump taken at the occurrence of the OOME and running
>> it through Eclipse Memory Analyser I noticed that
>>
>> 1. HashMap objects are consuming the most heap size (~10GB) ; and
>> 2. 54% of the HashMap instances do contain less than 12 elements ; and
>> 3. ~40% of the HashMap instances do contain 1 element ; and
>> 4. The ModifiedNodeState instances contains ~10GB of HashMap
>>
>> Since HashMaps consist in the vast majority of the memory consumed, memory
>> consumption could be diminished by using HashMaps with a higher fill
>> ratio.
>> Looking at the code in [0], it seems HashMaps are sometimes created with
>> default capacity.
>> Specifying the initial capacity for every new HashMap instance in [0] as
>> either (the capacity required) or 1 (if no better guess) would improve the
>> HashMap fill ratio and thus decrease the commits memory footprint.
>>
>> wdyt ?
>>
>> Regards,
>>
>> Timothee
>>
>> [0]
>>
>> org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState#ModifiedNodeState
>>
>>

Re: Memory usage and large commits

Posted by Michael Dürig <md...@apache.org>.

Changing the hash map fill ratio sounds like a work around to me. It 
will only push the problem a little further out.

Oak should be able to handle arbitrary large commits. So we should find 
and fix the root cause for this.

With which backend does this occur? At which phase of the commit? Did 
you collect thread dumps from the time when these hash maps start piling up?

ModifiedNodeState instances are used to collect transient changes in 
memory up to a certain point. Afterwards changes should be written ahead 
to the backend.

Michael

On 8.6.15 7:35 , Timothée Maret wrote:
> Hi,
>
> At Adobe, we use large commits to update our content repository atomically.
> Those large commits require a large amount of heap memory or the JVM throws
> OOMEs and the commit fails.
>
> In one setup, we are configuring the JVM with a max heap size of 32GB, yet
> we still hit OOMEs.
> I looked at the heap dump taken at the occurrence of the OOME and running
> it through Eclipse Memory Analyser I noticed that
>
> 1. HashMap objects are consuming the most heap size (~10GB) ; and
> 2. 54% of the HashMap instances do contain less than 12 elements ; and
> 3. ~40% of the HashMap instances do contain 1 element ; and
> 4. The ModifiedNodeState instances contains ~10GB of HashMap
>
> Since HashMaps consist in the vast majority of the memory consumed, memory
> consumption could be diminished by using HashMaps with a higher fill ratio.
> Looking at the code in [0], it seems HashMaps are sometimes created with
> default capacity.
> Specifying the initial capacity for every new HashMap instance in [0] as
> either (the capacity required) or 1 (if no better guess) would improve the
> HashMap fill ratio and thus decrease the commits memory footprint.
>
> wdyt ?
>
> Regards,
>
> Timothee
>
> [0]
> org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState#ModifiedNodeState
>