You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Michail Kotsiouros via user <us...@cassandra.apache.org> on 2022/09/19 10:45:00 UTC

Cassandra GC tuning

Hello community,
I observe some GC pauses while trying to create snapshots of a keyspace. The GC pauses as such are not long, even though they are reported in logs. The problem is the CPU utilization which affects other applications deployed in my server.
Do you have any articles or recommendations about tuning GC in Cassandra?

Thank you in advance.
BR
MK

Re: Cassandra GC tuning

Posted by Patrick McFadin <pm...@gmail.com>.
GC tuning may seem like it's the best move, but more than likely, that is
just the smoke from the real fire. Can you go more into your configuration?
Memory. CPU. DIsk. Many times, GC is what shows up when running out of disk
bandwidth or some other process eating up resources.

Patrick

On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user <
user@cassandra.apache.org> wrote:

> Hello community,
>
> I observe some GC pauses while trying to create snapshots of a keyspace.
> The GC pauses as such are not long, even though they are reported in logs.
> The problem is the CPU utilization which affects other applications
> deployed in my server.
>
> Do you have any articles or recommendations about tuning GC in Cassandra?
>
>
>
> Thank you in advance.
>
> BR
>
> MK
>

Re: Cassandra GC tuning

Posted by Patrick McFadin <pm...@gmail.com>.
Amy's Guide. Still getting it done after all these years. Legendary.

On Tue, Sep 20, 2022 at 6:05 AM Jeff Jirsa <jj...@gmail.com> wrote:

> Beyond this there are two decent tuning sets, but relatively dated at this
> point
>
> Cassandra-8150 proposed a number of changes to defaults based on how it
> had been tuned at a specific large (competent) user:
>
> ASF JIRA
> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>
> issues.apache.org
> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>
> [image: favicon.ico]
> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>
> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>
>
> Any Tobey wrote this guide around the 2.0/2.1 era, so it assumes things
> like jdk8 / CMS, but still has more rigor than most other guides you’ll
> find elsewhere and may help identify what’s going on even if the specific
> tuning isn’t super relevant in all cases:
>
> Amy's Cassandra 2.1 tuning guide : Amy Writes
> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>
> tobert.github.io
> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>
> [image: favicon.ico]
> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>
> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>
>
>
>
> On Sep 20, 2022, at 5:27 AM, Michail Kotsiouros via user <
> user@cassandra.apache.org> wrote:
>
> 
>
> Hello community,
>
> BTW I am using Cassandra 3.11.4. From your comments, I understand that a
> CPU spike and maybe a long GC may be expected at the snapshot creation
> under specific circumstances. I will monitor the resources during snapshot
> creation. I will come back with more news.
>
>
>
> Thanks a lot for your valuable input.
>
>
>
> BR
>
> MK
>
> *From:* Jeff Jirsa <jj...@gmail.com>
> *Sent:* Monday, September 19, 2022 20:06
> *To:* user@cassandra.apache.org; Michail Kotsiouros <
> michail.kotsiouros@ericsson.com>
> *Subject:* Re: Cassandra GC tuning
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may
> find that tuning those thresholds
>
>
>
> On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa <jj...@gmail.com> wrote:
>
> Snapshots are probably actually caused by a spike in disk IO and disk
> latency, not GC (you'll see longer STW pauses as you get to a safepoint if
> that disk is hanging). This is especially problematic on SATA SSDs, or nVME
> SSDs with poor IO scheduler tuning.  There's a patch somewhere to throttle
> hardlinks to try to mitigate this.
>
>
>
> On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user <
> user@cassandra.apache.org> wrote:
>
> Hello community,
>
> I observe some GC pauses while trying to create snapshots of a keyspace.
> The GC pauses as such are not long, even though they are reported in logs.
> The problem is the CPU utilization which affects other applications
> deployed in my server.
>
> Do you have any articles or recommendations about tuning GC in Cassandra?
>
>
>
> Thank you in advance.
>
> BR
>
> MK
>
>

RE: Cassandra GC tuning

Posted by Michail Kotsiouros via user <us...@cassandra.apache.org>.
Hello everyone,
Sorry for not responding earlier. The GC observed was indeed a symptom. The CPU spike and the slow Cassandra node responses was due to a massive connection of client processes. Most probably, this caused the GC as well.
The guides shared have a lot of interesting points, though that are useful in optimizing the Cassandra performance in general.

Thanks a lot once more for your comments and suggestions.

BR
MK
From: Jeff Jirsa <jj...@gmail.com>
Sent: Tuesday, September 20, 2022 16:04
To: user@cassandra.apache.org
Subject: Re: Cassandra GC tuning

Beyond this there are two decent tuning sets, but relatively dated at this point

Cassandra-8150 proposed a number of changes to defaults based on how it had been tuned at a specific large (competent) user:
ASF JIRA<https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>
issues.apache.org<https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>
[favicon.ico]<https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150>

Any Tobey wrote this guide around the 2.0/2.1 era, so it assumes things like jdk8 / CMS, but still has more rigor than most other guides you’ll find elsewhere and may help identify what’s going on even if the specific tuning isn’t super relevant in all cases:

Amy's Cassandra 2.1 tuning guide : Amy Writes<https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>
tobert.github.io<https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>
[favicon.ico]<https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html>




On Sep 20, 2022, at 5:27 AM, Michail Kotsiouros via user <us...@cassandra.apache.org>> wrote:

Hello community,
BTW I am using Cassandra 3.11.4. From your comments, I understand that a CPU spike and maybe a long GC may be expected at the snapshot creation under specific circumstances. I will monitor the resources during snapshot creation. I will come back with more news.

Thanks a lot for your valuable input.

BR
MK
From: Jeff Jirsa <jj...@gmail.com>>
Sent: Monday, September 19, 2022 20:06
To: user@cassandra.apache.org<ma...@cassandra.apache.org>; Michail Kotsiouros <mi...@ericsson.com>>
Subject: Re: Cassandra GC tuning

https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those thresholds

On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa <jj...@gmail.com>> wrote:
Snapshots are probably actually caused by a spike in disk IO and disk latency, not GC (you'll see longer STW pauses as you get to a safepoint if that disk is hanging). This is especially problematic on SATA SSDs, or nVME SSDs with poor IO scheduler tuning.  There's a patch somewhere to throttle hardlinks to try to mitigate this.

On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user <us...@cassandra.apache.org>> wrote:
Hello community,
I observe some GC pauses while trying to create snapshots of a keyspace. The GC pauses as such are not long, even though they are reported in logs. The problem is the CPU utilization which affects other applications deployed in my server.
Do you have any articles or recommendations about tuning GC in Cassandra?

Thank you in advance.
BR
MK

Re: Cassandra GC tuning

Posted by Jeff Jirsa <jj...@gmail.com>.
Beyond this there are two decent tuning sets, but relatively dated at this
point

  

Cassandra-8150 proposed a number of changes to defaults based on how it had
been tuned at a specific large (competent) user:  
  

[| [ASF
JIRA](https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150)[issues.apache.org](https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150)|
[![favicon.ico](cid:611F241A-E6FF-480D-A82E-359CC101C530)](https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150)  
---|---  
](https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150)

  

Any Tobey wrote this guide around the 2.0/2.1 era, so it assumes things like
jdk8 / CMS, but still has more rigor than most other guides you’ll find
elsewhere and may help identify what’s going on even if the specific tuning
isn’t super relevant in all cases:

  

[| [Amy's Cassandra 2.1 tuning guide : Amy
Writes](https://tobert.github.io/pages/als-cassandra-21-tuning-
guide.html)[tobert.github.io](https://tobert.github.io/pages/als-
cassandra-21-tuning-guide.html)|
[![favicon.ico](cid:982FC588-6C7D-4D33-9064-B5724FF414FD)](https://tobert.github.io/pages/als-
cassandra-21-tuning-guide.html)  
---|---  
](https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html)

  

  

  

> On Sep 20, 2022, at 5:27 AM, Michail Kotsiouros via user
> <us...@cassandra.apache.org> wrote:  
>  
>

> 
>
> Hello community,
>
> BTW I am using Cassandra 3.11.4. From your comments, I understand that a CPU
> spike and maybe a long GC may be expected at the snapshot creation under
> specific circumstances. I will monitor the resources during snapshot
> creation. I will come back with more news.
>
>  
>
> Thanks a lot for your valuable input.
>
>  
>
> BR
>
> MK
>
> **From:** Jeff Jirsa <jj...@gmail.com>  
>  **Sent:** Monday, September 19, 2022 20:06  
>  **To:** user@cassandra.apache.org; Michail Kotsiouros
> <mi...@ericsson.com>  
>  **Subject:** Re: Cassandra GC tuning
>
>  
>
> <https://issues.apache.org/jira/browse/CASSANDRA-13019> is in 4.0, you may
> find that tuning those thresholds
>
>  
>
> On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa
> <[jjirsa@gmail.com](mailto:jjirsa@gmail.com)> wrote:
>

>> Snapshots are probably actually caused by a spike in disk IO and disk
latency, not GC (you'll see longer STW pauses as you get to a safepoint if
that disk is hanging). This is especially problematic on SATA SSDs, or nVME
SSDs with poor IO scheduler tuning.  There's a patch somewhere to throttle
hardlinks to try to mitigate this.

>>

>>  
>>

>> On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user
<[user@cassandra.apache.org](mailto:user@cassandra.apache.org)> wrote:

>>

>>> Hello community,

>>>

>>> I observe some GC pauses while trying to create snapshots of a keyspace.
The GC pauses as such are not long, even though they are reported in logs. The
problem is the CPU utilization which affects other applications deployed in my
server.

>>>

>>> Do you have any articles or recommendations about tuning GC in Cassandra?

>>>

>>>  
>>>

>>> Thank you in advance.

>>>

>>> BR

>>>

>>> MK


RE: Cassandra GC tuning

Posted by Michail Kotsiouros via user <us...@cassandra.apache.org>.
Hello community,
BTW I am using Cassandra 3.11.4. From your comments, I understand that a CPU spike and maybe a long GC may be expected at the snapshot creation under specific circumstances. I will monitor the resources during snapshot creation. I will come back with more news.

Thanks a lot for your valuable input.

BR
MK
From: Jeff Jirsa <jj...@gmail.com>
Sent: Monday, September 19, 2022 20:06
To: user@cassandra.apache.org; Michail Kotsiouros <mi...@ericsson.com>
Subject: Re: Cassandra GC tuning

https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those thresholds

On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa <jj...@gmail.com>> wrote:
Snapshots are probably actually caused by a spike in disk IO and disk latency, not GC (you'll see longer STW pauses as you get to a safepoint if that disk is hanging). This is especially problematic on SATA SSDs, or nVME SSDs with poor IO scheduler tuning.  There's a patch somewhere to throttle hardlinks to try to mitigate this.

On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user <us...@cassandra.apache.org>> wrote:
Hello community,
I observe some GC pauses while trying to create snapshots of a keyspace. The GC pauses as such are not long, even though they are reported in logs. The problem is the CPU utilization which affects other applications deployed in my server.
Do you have any articles or recommendations about tuning GC in Cassandra?

Thank you in advance.
BR
MK

Re: Cassandra GC tuning

Posted by Jeff Jirsa <jj...@gmail.com>.
https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may
find that tuning those thresholds

On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa <jj...@gmail.com> wrote:

> Snapshots are probably actually caused by a spike in disk IO and disk
> latency, not GC (you'll see longer STW pauses as you get to a safepoint if
> that disk is hanging). This is especially problematic on SATA SSDs, or nVME
> SSDs with poor IO scheduler tuning.  There's a patch somewhere to throttle
> hardlinks to try to mitigate this.
>
> On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user <
> user@cassandra.apache.org> wrote:
>
>> Hello community,
>>
>> I observe some GC pauses while trying to create snapshots of a keyspace.
>> The GC pauses as such are not long, even though they are reported in logs.
>> The problem is the CPU utilization which affects other applications
>> deployed in my server.
>>
>> Do you have any articles or recommendations about tuning GC in Cassandra?
>>
>>
>>
>> Thank you in advance.
>>
>> BR
>>
>> MK
>>
>

Re: Cassandra GC tuning

Posted by Jeff Jirsa <jj...@gmail.com>.
Snapshots are probably actually caused by a spike in disk IO and disk
latency, not GC (you'll see longer STW pauses as you get to a safepoint if
that disk is hanging). This is especially problematic on SATA SSDs, or nVME
SSDs with poor IO scheduler tuning.  There's a patch somewhere to throttle
hardlinks to try to mitigate this.

On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user <
user@cassandra.apache.org> wrote:

> Hello community,
>
> I observe some GC pauses while trying to create snapshots of a keyspace.
> The GC pauses as such are not long, even though they are reported in logs.
> The problem is the CPU utilization which affects other applications
> deployed in my server.
>
> Do you have any articles or recommendations about tuning GC in Cassandra?
>
>
>
> Thank you in advance.
>
> BR
>
> MK
>