You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Randy Lynn <rl...@getavail.com> on 2018/06/28 16:23:27 UTC

JVM Heap erratic

I have datadog monitoring JVM heap.

Running 3.11.1.
20GB heap
G1 for GC.. all the G1GC settings are out-of-the-box

Does this look normal?

https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic_KQ07/view?usp=sharing

I'm a C# .NET guy, so I have no idea if this is normal Java behavior.



-- 
Randy Lynn
rlynn@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

<https://www.getavail.com/> getavail.com <https://www.getavail.com/>

Re: [EXTERNAL] Re: JVM Heap erratic

Posted by Randy Lynn <rl...@getavail.com>.

Alaine - Awesome information!!!
I had made some changes before seeing your email.

Current setup
CMS
16G heap
6G Eden
I also reduced Initiating Occupancy from 75 to 60. The thinking was that GC
will happen sooner giving me room for a burst to let GC catch up?? Maybe
that's completely wrong thinking?

MaxTenuring is still set to 1, so I'm eager to try what you're suggesting.
Our survivor ratio is set to 8 also.

Thanks for a well thought out, and detailed explanation!!

Randy

On Tue, Jul 3, 2018 at 8:27 AM, Durity, Sean R <SE...@homedepot.com>
wrote:

> THIS! A well-reasoned and clear explanation of a very difficult topic.
> This is the kind of gold that a user mailing list can provide. Thank you,
> Alain!
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Alain RODRIGUEZ <ar...@gmail.com>
> *Sent:* Tuesday, July 03, 2018 6:37 AM
> *To:* user cassandra.apache.org <us...@cassandra.apache.org>
> *Subject:* [EXTERNAL] Re: JVM Heap erratic
>
>
>
> Hello Randy,
>
>
>
> It's normal that the memory in the heap is having this pattern. Java uses
> memory available and when needed clean some memory for new needs, that's
> the variation you see. In your case, it's not really regular but this can
> depend on the workload as well.
>
>
>
> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>
>
>
> I feel you. I started operating Cassandra with no clue about the Garbage
> collection and other JVM stuff. When I started tuning it the first time
> with some former colleagues, we ended up removing half of the nodes of the
> cluster and still divided latency per 2. It is an important part of
> Cassandra to tune and often people (including myself) overlook it because
> it's too complex. I'll try to give you a big picture so you can have some
> analysis of what's going on and hopefully do some good to this cluster
> ("some good" - maybe not remove half of the nodes and reduce the latency,
> this was really a strong improvement on a badly tuned GC, but let's see :)
> ).
>
>
>
> The heap is a limited amount of memory used to store Java objects. It's
> composed of 3 sections: The New Generation, The Old Generation, the
> Permananent Generation. New objects go to the New Gen ('HEAP_NEW_SIZE' in
> CMS, auto in G1GC - do not set). From time to time, depending on usage and
> tuning, surviving objects are pushed from the Eden Space where they first
> land to one of the 2 survivor space (the other one is empty). Then,
> depending on the tenuring threshold option (in CMS, auto in G1GC too I
> believe), the data will be passed from one survivor to the other one,
> expiring old data in the process. This cleaning process in the New Gen is
> called the minor garbage collection (Minor GC) and is triggered when Eden
> is full. After the tenuring threshold is reached and the object was moved
> around survivor spaces x times, surviving objects will be promoted (or
> tenured) to the Old Gen. This promotion of living objects is referenced as
> a Major GC.
> This is the most expensive GC, and even though it will have to happen from
> time to time in almost all cases, it's interesting to reduce the total
> duration and frequency of Major GC to improve GC statistics overall. We can
> ignore the permanent Gen that is not triggering any important GC activity.
>
>
>
> Some more information is available here: http://www.oracle.com/
> webfolder/technetwork/tutorials/obe/java/gc01/index.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_webfolder_technetwork_tutorials_obe_java_gc01_index.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=slQHkmb7FuzuYYwsrwsj1tJsBSECBUGuJoqJoOL199Q&e=>
>
> In Cassandra, especially, in read-heavy workloads, objects can often
> expire before being promoted if given enough space and time to do so. And
> this is way more performant than promoting objects because we hadn't them
> surviving long enough in the New Gen.
>
>
>
> Using CMS with 20 GB is not recommended (out of the box, as a starting
> point at least) because CMS performances are known to degrade quickly with
> bigger heap than 8 GB. 20 GB is a lot. It also depends on the total memory
> available.
>
>
>
> tried 8GB = OOM
>
> tried 12GB = OOM
>
> tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
>
> tried 20GB w/ CMS = running
>
>
>
> OOM are not only related to the space available but also to the
> impossibility to clean the heap efficiently enough before we need the
> space. Thus tuning some more option than just the heap size might help.
>
>
> CMS (over G1CG)
> HEAP: 8 to 16 GB.
>
> NEW_HEAP: 25 to 50 % - nothing to do with CPU core contrary to
> documentation/comments in the file imho
>
> MaxTenuringThreshold: 15 - From 1 all the way up to 15, that's what gave
> me the best results in the past, it reduces major GC and makes the most of
> New Gen/minor GC, that are less impacting, but still "stop the world GC".
> Default is 1, which is often way to short to expire objects...
>
> SurvarorRatio: 2 to 8 - controls survivor spaces size. It will be:
> 'Survivor total space = New Gen Size / (SurvarorRatio + 2)'. Dividing by 2
> you have the size of each survivor. Here it will depend how fast the Eden
> space is allocated. Increasing the survivor space will disminuish the Eden
> space (where new objects are allocated) and there is a tradeoff here as
> well and a balance to find.
>
> I would try with these settings on a canary node:
>
> HEAP - 16 GB (if read heavy, if not probably between 8 and 12 GB is
> better).
> NEW_HEAP - 50% of the heap (4 - 8GB)
>
> MaxTenuringThreshold: 15
>
> SurvarorRatio: 4,
>
>
>
> When testing GC, there is not a better way than using a canary node, pick
> one rack and node(s) you want in this rack to test. This should not impact
> availability or consistency. If you're able to reproduce the workload
> perfectly in the staging cluster, it's perfect but I don't know much
> companies/people able to do this and the use of a canary node should be
> safe :).
>
> I could probably share some thoughts on what the cluster really needs,
> rather than making guesses and suggesting a somewhat arbitrary tuning, as I
> did above if you would share a gc.log file with us from one of the nodes.
> Garbage Collection tuning is a bit tricky, but a good tuning can divide
> latency while cutting in the number of host. I have seen impressives
> changes in the past really.
> There is a lot of details in this log file about where is the biggest
> pressure, the allocation rate, the GC duration distribution for each type
> of GC, etc. With this, I could see where the pressure is and suggest how to
> work on it.
>
> Be aware that extra GC is also sometimes the consequence (and not a cause)
> of an issue. Due to pending requests, wide partitions, ongoing compactions,
> repairs or an intensive workload, GC can pressure can increase and mask
> another underlying, and root issue. You might want to check that the
> cluster is healthy other than GC, as a lot of distinct internal parts of
> Cassandra have an impact on the GC.
>
>
>
> Hope that helps,
>
>
>
> C*heers,
>
> -----------------------
>
> Alain Rodriguez - @arodream - alain@thelastpickle.com
>
> France / Spain
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=pcPsSYYtEJTtrRfRCwYCziiHKlKMtY-Zn4x_ZQmHDZQ&e=>
>
>
>
>
>
> 2018-06-28 23:19 GMT+01:00 Elliott Sims <el...@backblaze.com>:
>
> Odd.  Your "post-GC" heap level seems a lot lower than your max, which
> implies that you should be OK with ~10GB.  I'm guessing either you're
> genuinely getting a huge surge in needed heap and running out, or it's
> falling behind and garbage is building up.  If the latter, there might be
> some tweaking you can do.  Probably worth turning on GC logging and digging
> through exactly what's happening.
>
>
>
> CMS is kind of hard to tune and can have problems with heap fragmentation
> since it doesn't compact, but if it's working for you I'd say stick with it.
>
>
>
> On Thu, Jun 28, 2018 at 3:14 PM, Randy Lynn <rl...@getavail.com> wrote:
>
> Thanks for the feedback..
>
>
>
> Getting tons of OOM lately..
>
>
>
> You mentioned overprovisioned heap size... well...
>
> tried 8GB = OOM
>
> tried 12GB = OOM
>
> tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
>
> tried 20GB w/ CMS = running
>
>
>
> we're java 8 update 151.
>
> 3.11.1.
>
>
>
> We've got one table that's got a 400MB partition.. that's the max.. the
> 99th is < 100MB, and 95th < 30MB..
>
> So I'm not sure that I'm overprovisioned, I'm just not quite yet to the
> heap size based on our partition sizes.
>
> All queries use cluster key, so I'm not accidentally reading a whole
> partition.
>
> The last place I'm looking - which maybe should be the first - is
> tombstones.
>
>
>
> sorry for the afternoon rant! thanks for your eyes!
>
>
>
> On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims <el...@backblaze.com>
> wrote:
>
> It depends a bit on which collector you're using, but fairly normal.  Heap
> grows for a while, then the JVM decides via a variety of metrics that it's
> time to run a collection.  G1GC is usually a bit steadier and less sawtooth
> than the Parallel Mark Sweep , but if your heap's a lot bigger than needed
> I could see it producing that pattern.
>
>
>
> On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn <rl...@getavail.com> wrote:
>
> I have datadog monitoring JVM heap.
>
>
>
> Running 3.11.1.
>
> 20GB heap
>
> G1 for GC.. all the G1GC settings are out-of-the-box
>
>
>
> Does this look normal?
>
>
>
> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic_
> KQ07/view?usp=sharing
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_1hLMbG53DWv5zNKSY88BmI3Wd0ic-5FKQ07_view-3Fusp-3Dsharing&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=W_XwdIbS2_7_Xo-6V9MlFRy8bVQmwudNLSqMvsDdXRM&e=>
>
>
>
> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>
>
>
>
>
>
> --
>
> *Randy Lynn *
> rlynn@getavail.com
>
> office:
>
> 859.963.1616 <+1-859-963-1616>ext 202
>
> 163 East Main Street - Lexington, KY 40507 - USA
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D163-2BEast-2BMain-2BStreet-2B-2D-2BLexington-2C-2BKY-2B40507-2B-2D-2BUSA-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=i3GCGjiy5RQ8LY7ZBtD3UrUFOo4VkIy3Y8gZf0s42SM&e=>
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>
>
> getavail.com
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>
>
>
>
>
>
>
> --
>
> *Randy Lynn *
> rlynn@getavail.com
>
> office:
>
> 859.963.1616 <+1-859-963-1616>ext 202
>
> 163 East Main Street - Lexington, KY 40507 - USA
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D163-2BEast-2BMain-2BStreet-2B-2D-2BLexington-2C-2BKY-2B40507-2B-2D-2BUSA-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=i3GCGjiy5RQ8LY7ZBtD3UrUFOo4VkIy3Y8gZf0s42SM&e=>
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>
>
> getavail.com
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>
>
>
>
>
>



-- 
Randy Lynn
rlynn@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

<https://www.getavail.com/> getavail.com <https://www.getavail.com/>

RE: [EXTERNAL] Re: JVM Heap erratic

Posted by "Durity, Sean R" <SE...@homedepot.com>.

THIS! A well-reasoned and clear explanation of a very difficult topic. This is the kind of gold that a user mailing list can provide. Thank you, Alain!

Sean Durity

From: Alain RODRIGUEZ <ar...@gmail.com>
Sent: Tuesday, July 03, 2018 6:37 AM
To: user cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Re: JVM Heap erratic

Hello Randy,

It's normal that the memory in the heap is having this pattern. Java uses memory available and when needed clean some memory for new needs, that's the variation you see. In your case, it's not really regular but this can depend on the workload as well.

I'm a C# .NET guy, so I have no idea if this is normal Java behavior.

I feel you. I started operating Cassandra with no clue about the Garbage collection and other JVM stuff. When I started tuning it the first time with some former colleagues, we ended up removing half of the nodes of the cluster and still divided latency per 2. It is an important part of Cassandra to tune and often people (including myself) overlook it because it's too complex. I'll try to give you a big picture so you can have some analysis of what's going on and hopefully do some good to this cluster ("some good" - maybe not remove half of the nodes and reduce the latency, this was really a strong improvement on a badly tuned GC, but let's see :) ).

The heap is a limited amount of memory used to store Java objects. It's composed of 3 sections: The New Generation, The Old Generation, the Permananent Generation. New objects go to the New Gen ('HEAP_NEW_SIZE' in CMS, auto in G1GC - do not set). From time to time, depending on usage and tuning, surviving objects are pushed from the Eden Space where they first land to one of the 2 survivor space (the other one is empty). Then, depending on the tenuring threshold option (in CMS, auto in G1GC too I believe), the data will be passed from one survivor to the other one, expiring old data in the process. This cleaning process in the New Gen is called the minor garbage collection (Minor GC) and is triggered when Eden is full. After the tenuring threshold is reached and the object was moved around survivor spaces x times, surviving objects will be promoted (or tenured) to the Old Gen. This promotion of living objects is referenced as a Major GC.
This is the most expensive GC, and even though it will have to happen from time to time in almost all cases, it's interesting to reduce the total duration and frequency of Major GC to improve GC statistics overall. We can ignore the permanent Gen that is not triggering any important GC activity.

Some more information is available here: http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_webfolder_technetwork_tutorials_obe_java_gc01_index.html&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=slQHkmb7FuzuYYwsrwsj1tJsBSECBUGuJoqJoOL199Q&e=>

In Cassandra, especially, in read-heavy workloads, objects can often expire before being promoted if given enough space and time to do so. And this is way more performant than promoting objects because we hadn't them surviving long enough in the New Gen.

Using CMS with 20 GB is not recommended (out of the box, as a starting point at least) because CMS performances are known to degrade quickly with bigger heap than 8 GB. 20 GB is a lot. It also depends on the total memory available.

tried 8GB = OOM
tried 12GB = OOM
tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
tried 20GB w/ CMS = running

OOM are not only related to the space available but also to the impossibility to clean the heap efficiently enough before we need the space. Thus tuning some more option than just the heap size might help.

CMS (over G1CG)
HEAP: 8 to 16 GB.
NEW_HEAP: 25 to 50 % - nothing to do with CPU core contrary to documentation/comments in the file imho
MaxTenuringThreshold: 15 - From 1 all the way up to 15, that's what gave me the best results in the past, it reduces major GC and makes the most of New Gen/minor GC, that are less impacting, but still "stop the world GC". Default is 1, which is often way to short to expire objects...
SurvarorRatio: 2 to 8 - controls survivor spaces size. It will be: 'Survivor total space = New Gen Size / (SurvarorRatio + 2)'. Dividing by 2 you have the size of each survivor. Here it will depend how fast the Eden space is allocated. Increasing the survivor space will disminuish the Eden space (where new objects are allocated) and there is a tradeoff here as well and a balance to find.

I would try with these settings on a canary node:
HEAP - 16 GB (if read heavy, if not probably between 8 and 12 GB is better).
NEW_HEAP - 50% of the heap (4 - 8GB)
MaxTenuringThreshold: 15
SurvarorRatio: 4,

When testing GC, there is not a better way than using a canary node, pick one rack and node(s) you want in this rack to test. This should not impact availability or consistency. If you're able to reproduce the workload perfectly in the staging cluster, it's perfect but I don't know much companies/people able to do this and the use of a canary node should be safe :).
I could probably share some thoughts on what the cluster really needs, rather than making guesses and suggesting a somewhat arbitrary tuning, as I did above if you would share a gc.log file with us from one of the nodes. Garbage Collection tuning is a bit tricky, but a good tuning can divide latency while cutting in the number of host. I have seen impressives changes in the past really.
There is a lot of details in this log file about where is the biggest pressure, the allocation rate, the GC duration distribution for each type of GC, etc. With this, I could see where the pressure is and suggest how to work on it.

Be aware that extra GC is also sometimes the consequence (and not a cause) of an issue. Due to pending requests, wide partitions, ongoing compactions, repairs or an intensive workload, GC can pressure can increase and mask another underlying, and root issue. You might want to check that the cluster is healthy other than GC, as a lot of distinct internal parts of Cassandra have an impact on the GC.

Hope that helps,

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com<ma...@thelastpickle.com>
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=pcPsSYYtEJTtrRfRCwYCziiHKlKMtY-Zn4x_ZQmHDZQ&e=>

2018-06-28 23:19 GMT+01:00 Elliott Sims <el...@backblaze.com>>:
Odd.  Your "post-GC" heap level seems a lot lower than your max, which implies that you should be OK with ~10GB.  I'm guessing either you're genuinely getting a huge surge in needed heap and running out, or it's falling behind and garbage is building up.  If the latter, there might be some tweaking you can do.  Probably worth turning on GC logging and digging through exactly what's happening.

CMS is kind of hard to tune and can have problems with heap fragmentation since it doesn't compact, but if it's working for you I'd say stick with it.

On Thu, Jun 28, 2018 at 3:14 PM, Randy Lynn <rl...@getavail.com>> wrote:
Thanks for the feedback..

Getting tons of OOM lately..

You mentioned overprovisioned heap size... well...
tried 8GB = OOM
tried 12GB = OOM
tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
tried 20GB w/ CMS = running

we're java 8 update 151.
3.11.1.

We've got one table that's got a 400MB partition.. that's the max.. the 99th is < 100MB, and 95th < 30MB..
So I'm not sure that I'm overprovisioned, I'm just not quite yet to the heap size based on our partition sizes.
All queries use cluster key, so I'm not accidentally reading a whole partition.
The last place I'm looking - which maybe should be the first - is tombstones.

sorry for the afternoon rant! thanks for your eyes!

On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims <el...@backblaze.com>> wrote:
It depends a bit on which collector you're using, but fairly normal.  Heap grows for a while, then the JVM decides via a variety of metrics that it's time to run a collection.  G1GC is usually a bit steadier and less sawtooth than the Parallel Mark Sweep , but if your heap's a lot bigger than needed I could see it producing that pattern.

On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn <rl...@getavail.com>> wrote:
I have datadog monitoring JVM heap.

Running 3.11.1.
20GB heap
G1 for GC.. all the G1GC settings are out-of-the-box

Does this look normal?

https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic_KQ07/view?usp=sharing<https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_1hLMbG53DWv5zNKSY88BmI3Wd0ic-5FKQ07_view-3Fusp-3Dsharing&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=W_XwdIbS2_7_Xo-6V9MlFRy8bVQmwudNLSqMvsDdXRM&e=>

I'm a C# .NET guy, so I have no idea if this is normal Java behavior.

--
Randy Lynn
rlynn@getavail.com<ma...@getavail.com>

office:

859.963.1616 <tel:+1-859-963-1616> ext 202

163 East Main Street - Lexington, KY 40507 - USA<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D163-2BEast-2BMain-2BStreet-2B-2D-2BLexington-2C-2BKY-2B40507-2B-2D-2BUSA-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=i3GCGjiy5RQ8LY7ZBtD3UrUFOo4VkIy3Y8gZf0s42SM&e=>

[https://www.getavail.com/Content/Images/blueavaillogo_small.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>

getavail.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>

--
Randy Lynn
rlynn@getavail.com<ma...@getavail.com>

office:

859.963.1616 <tel:+1-859-963-1616> ext 202

163 East Main Street - Lexington, KY 40507 - USA<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D163-2BEast-2BMain-2BStreet-2B-2D-2BLexington-2C-2BKY-2B40507-2B-2D-2BUSA-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=i3GCGjiy5RQ8LY7ZBtD3UrUFOo4VkIy3Y8gZf0s42SM&e=>

[https://www.getavail.com/Content/Images/blueavaillogo_small.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>

getavail.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.getavail.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=_N23kKi9SY0Tt1hzPDfk1n9Mg6eqAQOI_sIwbUN_L88&s=uV08GEj1H0jiUyFDQrQu5Ql0OGXIs29bv0NGqYUH_M0&e=>

Re: JVM Heap erratic

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hello Randy,

It's normal that the memory in the heap is having this pattern. Java uses
memory available and when needed clean some memory for new needs, that's
the variation you see. In your case, it's not really regular but this can
depend on the workload as well.

I'm a C# .NET guy, so I have no idea if this is normal Java behavior.

I feel you. I started operating Cassandra with no clue about the Garbage
collection and other JVM stuff. When I started tuning it the first time
with some former colleagues, we ended up removing half of the nodes of the
cluster and still divided latency per 2. It is an important part of
Cassandra to tune and often people (including myself) overlook it because
it's too complex. I'll try to give you a big picture so you can have some
analysis of what's going on and hopefully do some good to this cluster
("some good" - maybe not remove half of the nodes and reduce the latency,
this was really a strong improvement on a badly tuned GC, but let's see :)
).

The heap is a limited amount of memory used to store Java objects. It's
composed of 3 sections: The New Generation, The Old Generation, the
Permananent Generation. New objects go to the New Gen ('HEAP_NEW_SIZE' in
CMS, auto in G1GC - do not set). From time to time, depending on usage and
tuning, surviving objects are pushed from the Eden Space where they first
land to one of the 2 survivor space (the other one is empty). Then,
depending on the tenuring threshold option (in CMS, auto in G1GC too I
believe), the data will be passed from one survivor to the other one,
expiring old data in the process. This cleaning process in the New Gen is
called the minor garbage collection (Minor GC) and is triggered when Eden
is full. After the tenuring threshold is reached and the object was moved
around survivor spaces x times, surviving objects will be promoted (or
tenured) to the Old Gen. This promotion of living objects is referenced as
a Major GC.
This is the most expensive GC, and even though it will have to happen from
time to time in almost all cases, it's interesting to reduce the total
duration and frequency of Major GC to improve GC statistics overall. We can
ignore the permanent Gen that is not triggering any important GC activity.

Some more information is available here: http://www.oracle.com/
webfolder/technetwork/tutorials/obe/java/gc01/index.html

In Cassandra, especially, in read-heavy workloads, objects can often expire
before being promoted if given enough space and time to do so. And this is
way more performant than promoting objects because we hadn't them surviving
long enough in the New Gen.

Using CMS with 20 GB is not recommended (out of the box, as a starting
point at least) because CMS performances are known to degrade quickly with
bigger heap than 8 GB. 20 GB is a lot. It also depends on the total memory
available.

tried 8GB = OOM
> tried 12GB = OOM
> tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
> tried 20GB w/ CMS = running
>

OOM are not only related to the space available but also to the
impossibility to clean the heap efficiently enough before we need the
space. Thus tuning some more option than just the heap size might help.

CMS (over G1CG)
HEAP: 8 to 16 GB.
NEW_HEAP: 25 to 50 % - nothing to do with CPU core contrary to
documentation/comments in the file imho
MaxTenuringThreshold: 15 - From 1 all the way up to 15, that's what gave me
the best results in the past, it reduces major GC and makes the most of New
Gen/minor GC, that are less impacting, but still "stop the world GC".
Default is 1, which is often way to short to expire objects...
SurvarorRatio: 2 to 8 - controls survivor spaces size. It will be:
'Survivor total space = New Gen Size / (SurvarorRatio + 2)'. Dividing by 2
you have the size of each survivor. Here it will depend how fast the Eden
space is allocated. Increasing the survivor space will disminuish the Eden
space (where new objects are allocated) and there is a tradeoff here as
well and a balance to find.

I would try with these settings on a canary node:
HEAP - 16 GB (if read heavy, if not probably between 8 and 12 GB is better).
NEW_HEAP - 50% of the heap (4 - 8GB)
MaxTenuringThreshold: 15
SurvarorRatio: 4,

When testing GC, there is not a better way than using a canary node, pick
one rack and node(s) you want in this rack to test. This should not impact
availability or consistency. If you're able to reproduce the workload
perfectly in the staging cluster, it's perfect but I don't know much
companies/people able to do this and the use of a canary node should be
safe :).

I could probably share some thoughts on what the cluster really needs,
rather than making guesses and suggesting a somewhat arbitrary tuning, as I
did above if you would share a gc.log file with us from one of the nodes.
Garbage Collection tuning is a bit tricky, but a good tuning can divide
latency while cutting in the number of host. I have seen impressives
changes in the past really.
There is a lot of details in this log file about where is the biggest
pressure, the allocation rate, the GC duration distribution for each type
of GC, etc. With this, I could see where the pressure is and suggest how to
work on it.

Be aware that extra GC is also sometimes the consequence (and not a cause)
of an issue. Due to pending requests, wide partitions, ongoing compactions,
repairs or an intensive workload, GC can pressure can increase and mask
another underlying, and root issue. You might want to check that the
cluster is healthy other than GC, as a lot of distinct internal parts of
Cassandra have an impact on the GC.

Hope that helps,

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-06-28 23:19 GMT+01:00 Elliott Sims <el...@backblaze.com>:

> Odd.  Your "post-GC" heap level seems a lot lower than your max, which
> implies that you should be OK with ~10GB.  I'm guessing either you're
> genuinely getting a huge surge in needed heap and running out, or it's
> falling behind and garbage is building up.  If the latter, there might be
> some tweaking you can do.  Probably worth turning on GC logging and digging
> through exactly what's happening.
>
> CMS is kind of hard to tune and can have problems with heap fragmentation
> since it doesn't compact, but if it's working for you I'd say stick with it.
>
> On Thu, Jun 28, 2018 at 3:14 PM, Randy Lynn <rl...@getavail.com> wrote:
>
>> Thanks for the feedback..
>>
>> Getting tons of OOM lately..
>>
>> You mentioned overprovisioned heap size... well...
>> tried 8GB = OOM
>> tried 12GB = OOM
>> tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
>> tried 20GB w/ CMS = running
>>
>> we're java 8 update 151.
>> 3.11.1.
>>
>> We've got one table that's got a 400MB partition.. that's the max.. the
>> 99th is < 100MB, and 95th < 30MB..
>> So I'm not sure that I'm overprovisioned, I'm just not quite yet to the
>> heap size based on our partition sizes.
>> All queries use cluster key, so I'm not accidentally reading a whole
>> partition.
>> The last place I'm looking - which maybe should be the first - is
>> tombstones.
>>
>> sorry for the afternoon rant! thanks for your eyes!
>>
>> On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims <el...@backblaze.com>
>> wrote:
>>
>>> It depends a bit on which collector you're using, but fairly normal.
>>> Heap grows for a while, then the JVM decides via a variety of metrics that
>>> it's time to run a collection.  G1GC is usually a bit steadier and less
>>> sawtooth than the Parallel Mark Sweep , but if your heap's a lot bigger
>>> than needed I could see it producing that pattern.
>>>
>>> On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn <rl...@getavail.com> wrote:
>>>
>>>> I have datadog monitoring JVM heap.
>>>>
>>>> Running 3.11.1.
>>>> 20GB heap
>>>> G1 for GC.. all the G1GC settings are out-of-the-box
>>>>
>>>> Does this look normal?
>>>>
>>>> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic
>>>> _KQ07/view?usp=sharing
>>>>
>>>> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>>>>
>>>>
>>>>
>>>> --
>>>> Randy Lynn
>>>> rlynn@getavail.com
>>>>
>>>> office:
>>>> 859.963.1616 <+1-859-963-1616> ext 202
>>>> 163 East Main Street - Lexington, KY 40507 - USA
>>>> <https://maps.google.com/?q=163+East+Main+Street+-+Lexington,+KY+40507+-+USA&entry=gmail&source=g>
>>>>
>>>> <https://www.getavail.com/> getavail.com <https://www.getavail.com/>
>>>>
>>>
>>>
>>
>>
>> --
>> Randy Lynn
>> rlynn@getavail.com
>>
>> office:
>> 859.963.1616 <+1-859-963-1616> ext 202
>> 163 East Main Street - Lexington, KY 40507 - USA
>> <https://maps.google.com/?q=163+East+Main+Street+-+Lexington,+KY+40507+-+USA&entry=gmail&source=g>
>>
>> <https://www.getavail.com/> getavail.com <https://www.getavail.com/>
>>
>
>

Re: JVM Heap erratic

Posted by Elliott Sims <el...@backblaze.com>.

Odd.  Your "post-GC" heap level seems a lot lower than your max, which
implies that you should be OK with ~10GB.  I'm guessing either you're
genuinely getting a huge surge in needed heap and running out, or it's
falling behind and garbage is building up.  If the latter, there might be
some tweaking you can do.  Probably worth turning on GC logging and digging
through exactly what's happening.

CMS is kind of hard to tune and can have problems with heap fragmentation
since it doesn't compact, but if it's working for you I'd say stick with it.

On Thu, Jun 28, 2018 at 3:14 PM, Randy Lynn <rl...@getavail.com> wrote:

> Thanks for the feedback..
>
> Getting tons of OOM lately..
>
> You mentioned overprovisioned heap size... well...
> tried 8GB = OOM
> tried 12GB = OOM
> tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
> tried 20GB w/ CMS = running
>
> we're java 8 update 151.
> 3.11.1.
>
> We've got one table that's got a 400MB partition.. that's the max.. the
> 99th is < 100MB, and 95th < 30MB..
> So I'm not sure that I'm overprovisioned, I'm just not quite yet to the
> heap size based on our partition sizes.
> All queries use cluster key, so I'm not accidentally reading a whole
> partition.
> The last place I'm looking - which maybe should be the first - is
> tombstones.
>
> sorry for the afternoon rant! thanks for your eyes!
>
> On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims <el...@backblaze.com>
> wrote:
>
>> It depends a bit on which collector you're using, but fairly normal.
>> Heap grows for a while, then the JVM decides via a variety of metrics that
>> it's time to run a collection.  G1GC is usually a bit steadier and less
>> sawtooth than the Parallel Mark Sweep , but if your heap's a lot bigger
>> than needed I could see it producing that pattern.
>>
>> On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn <rl...@getavail.com> wrote:
>>
>>> I have datadog monitoring JVM heap.
>>>
>>> Running 3.11.1.
>>> 20GB heap
>>> G1 for GC.. all the G1GC settings are out-of-the-box
>>>
>>> Does this look normal?
>>>
>>> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic
>>> _KQ07/view?usp=sharing
>>>
>>> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>>>
>>>
>>>
>>> --
>>> Randy Lynn
>>> rlynn@getavail.com
>>>
>>> office:
>>> 859.963.1616 <+1-859-963-1616> ext 202
>>> 163 East Main Street - Lexington, KY 40507 - USA
>>> <https://maps.google.com/?q=163+East+Main+Street+-+Lexington,+KY+40507+-+USA&entry=gmail&source=g>
>>>
>>> <https://www.getavail.com/> getavail.com <https://www.getavail.com/>
>>>
>>
>>
>
>
> --
> Randy Lynn
> rlynn@getavail.com
>
> office:
> 859.963.1616 <+1-859-963-1616> ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
> <https://maps.google.com/?q=163+East+Main+Street+-+Lexington,+KY+40507+-+USA&entry=gmail&source=g>
>
> <https://www.getavail.com/> getavail.com <https://www.getavail.com/>
>

Re: JVM Heap erratic

Posted by Randy Lynn <rl...@getavail.com>.

Thanks for the feedback..

Getting tons of OOM lately..

You mentioned overprovisioned heap size... well...
tried 8GB = OOM
tried 12GB = OOM
tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
tried 20GB w/ CMS = running

we're java 8 update 151.
3.11.1.

We've got one table that's got a 400MB partition.. that's the max.. the
99th is < 100MB, and 95th < 30MB..
So I'm not sure that I'm overprovisioned, I'm just not quite yet to the
heap size based on our partition sizes.
All queries use cluster key, so I'm not accidentally reading a whole
partition.
The last place I'm looking - which maybe should be the first - is
tombstones.

sorry for the afternoon rant! thanks for your eyes!

On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims <el...@backblaze.com> wrote:

> It depends a bit on which collector you're using, but fairly normal.  Heap
> grows for a while, then the JVM decides via a variety of metrics that it's
> time to run a collection.  G1GC is usually a bit steadier and less sawtooth
> than the Parallel Mark Sweep , but if your heap's a lot bigger than needed
> I could see it producing that pattern.
>
> On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn <rl...@getavail.com> wrote:
>
>> I have datadog monitoring JVM heap.
>>
>> Running 3.11.1.
>> 20GB heap
>> G1 for GC.. all the G1GC settings are out-of-the-box
>>
>> Does this look normal?
>>
>> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic
>> _KQ07/view?usp=sharing
>>
>> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>>
>>
>>
>> --
>> Randy Lynn
>> rlynn@getavail.com
>>
>> office:
>> 859.963.1616 <+1-859-963-1616> ext 202
>> 163 East Main Street - Lexington, KY 40507 - USA
>> <https://maps.google.com/?q=163+East+Main+Street+-+Lexington,+KY+40507+-+USA&entry=gmail&source=g>
>>
>> <https://www.getavail.com/> getavail.com <https://www.getavail.com/>
>>
>
>


-- 
Randy Lynn
rlynn@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

<https://www.getavail.com/> getavail.com <https://www.getavail.com/>

Re: JVM Heap erratic

Posted by Elliott Sims <el...@backblaze.com>.

It depends a bit on which collector you're using, but fairly normal.  Heap
grows for a while, then the JVM decides via a variety of metrics that it's
time to run a collection.  G1GC is usually a bit steadier and less sawtooth
than the Parallel Mark Sweep , but if your heap's a lot bigger than needed
I could see it producing that pattern.

On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn <rl...@getavail.com> wrote:

> I have datadog monitoring JVM heap.
>
> Running 3.11.1.
> 20GB heap
> G1 for GC.. all the G1GC settings are out-of-the-box
>
> Does this look normal?
>
> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic_
> KQ07/view?usp=sharing
>
> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>
>
>
> --
> Randy Lynn
> rlynn@getavail.com
>
> office:
> 859.963.1616 <+1-859-963-1616> ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
> <https://maps.google.com/?q=163+East+Main+Street+-+Lexington,+KY+40507+-+USA&entry=gmail&source=g>
>
> <https://www.getavail.com/> getavail.com <https://www.getavail.com/>
>