You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Steinmaurer, Thomas" <th...@dynatrace.com> on 2019/10/22 10:47:13 UTC

Cassandra 2.1.18 - Question on stream/bootstrap throughput

Hello,

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both scenarios (throttling with high values vs. unthrottled), the 4th node is streaming from one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also confirmed by e.g. iperf in regard to NW throughput and write to / read from disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically outrule above settings?

Thanks,
Thomas


The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freist?dterstra?e 313

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by "Steinmaurer, Thomas" <th...@dynatrace.com>.

Reid, Jon, thanks for the feedback and comments. Interesting readings.

https://jira.apache.org/jira/browse/CASSANDRA-9766 is basically describing exactly the same what we are experiencing, namely e.g. unthrottling not changing anything at all, thus I simply take it as Cassandra itself is limiting here.

Thomas

From: Reid Pinchback <rp...@tripadvisor.com>
Sent: Dienstag, 22. Oktober 2019 19:31
To: user@cassandra.apache.org
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Thanks for the reading Jon.  😊

From: Jon Haddad <jo...@jonhaddad.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, October 22, 2019 at 12:32 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
CPU waiting on memory will look like CPU overhead.   There's a good post on the topic by Brendan Gregg: http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__www.brendangregg.com_blog_2017-2D05-2D09_cpu-2Dutilization-2Dis-2Dwrong.html%26d%3DDwMFaQ%26c%3D9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA%26r%3DOIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc%26m%3DuyQyRQAH6rGAAtjwZF7Xzd0gwksPBtKKNFpzfyi9f2M%26s%3Dg-34YFo5F6gV_lvv-fCjlGn5SdvQJRFUOT0DIohRpCQ%26e%3D&data=02%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cf7dabeb22cae4324984208d75719681f%7C70ebe3a35b30435d9d677716d74ca190%7C1%7C0%7C637073638948845238&sdata=6Z%2BsOsGuE1JX42qcpShjqFlUsByjbL0HG%2F9VAJ4Qxy4%3D&reserved=0>

Regarding GC, I agree with Reid.  You're probably not going to saturate your network card no matter what your settings, Cassandra has way too much overhead to do that.  It's one of the reasons why the whole zero-copy streaming feature was added to Cassandra 4.0: http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__cassandra.apache.org_blog_2018_08_07_faster-5Fstreaming-5Fin-5Fcassandra.html%26d%3DDwMFaQ%26c%3D9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA%26r%3DOIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc%26m%3DuyQyRQAH6rGAAtjwZF7Xzd0gwksPBtKKNFpzfyi9f2M%26s%3DkCbODyLouPOI__Ku2DHXUXvBhw29wixkEsbXj8uwICk%26e%3D&data=02%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cf7dabeb22cae4324984208d75719681f%7C70ebe3a35b30435d9d677716d74ca190%7C1%7C0%7C637073638948845238&sdata=c0gmFVay2ly9SRTNC2EFgFuTA%2BKnDRMs7IOu6E4IGIc%3D&reserved=0>

Reid is also correct in pointing out the method by which you're monitoring your metrics might be problematic.  With prometheus, the same data can show significantly different graphs when using rate vs irate, and only collecting once a minute would hide a lot of useful data.

If you keep digging and find you're not using all your CPU during GC pauses, you can try using more GC threads by setting -XX:ParallelGCThreads to match the number of cores you have, since by default it won't use them all.  You've got 40 cores in the m4.10xlarge, try setting -XX:ParallelGCThreads to 40.
Jon

On Tue, Oct 22, 2019 at 11:38 AM Reid Pinchback <rp...@tripadvisor.com>> wrote:
Thomas, what is your frequency of metric collection?  If it is minute-level granularity, that can give a very false impression.  I’ve seen CPU and disk throttles that don’t even begin to show visibility until second-level granularity around the time of the constraining event.  Even clearer is 100ms.

Also, are you monitoring your GC activity at all?  GC bound up in a lot of memory copies is not going to manifest that much CPU, it’s memory bus bandwidth you are fighting against then.  It is easy to have a box that looks unused but in reality its struggling.  Given that you’ve opened up the floodgates on compaction, that would seem quite plausible to be what you are experiencing.

From: "Steinmaurer, Thomas" <th...@dynatrace.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, October 22, 2019 at 11:22 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
Hi Alex,

Increased streaming throughput has been set on the existing nodes only, cause it is meant to limit outgoing traffic only, right? At least when judging from the name, reading the documentation etc.

Increased compaction throughput on all nodes, although my understanding is that it would be necessary only on the joining node to catchup with compacting received SSTables.

We really see no resource (CPU, NW and disk) being somehow maxed out on any node, which would explain the limit in the area of the new node receiving data at ~ 180-200 Mbit/s.

Thanks again,
Thomas

From: Oleksandr Shulgin <ol...@zalando.de>>
Sent: Dienstag, 22. Oktober 2019 16:35
To: User <us...@cassandra.apache.org>>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas <th...@dynatrace.com>> wrote:

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both scenarios (throttling with high values vs. unthrottled), the 4th node is streaming from one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also confirmed by e.g. iperf in regard to NW throughput and write to / read from disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically outrule above settings?

Hi Thomas,

Assuming you have 3 Availability Zones and you are adding the new node to one of the zones where you already have a node running, it is expected that it only streams from that node (its local rack).

Have you increased the streaming throughput on the node it streams from or only on the new node?  The limit applies to the source node as well.  You can change it online w/o the need to restart using nodetool command.

Have you checked if the new node is not CPU-bound?  It's unlikely though due to big instance type and only one node to stream from, more relevant for scenarios when streaming from a lot of nodes.

Cheers,
--
Alex

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by Reid Pinchback <rp...@tripadvisor.com>.

Thanks for the reading Jon.  😊

From: Jon Haddad <jo...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, October 22, 2019 at 12:32 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
CPU waiting on memory will look like CPU overhead.   There's a good post on the topic by Brendan Gregg: http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.brendangregg.com_blog_2017-2D05-2D09_cpu-2Dutilization-2Dis-2Dwrong.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=uyQyRQAH6rGAAtjwZF7Xzd0gwksPBtKKNFpzfyi9f2M&s=g-34YFo5F6gV_lvv-fCjlGn5SdvQJRFUOT0DIohRpCQ&e=>

Regarding GC, I agree with Reid.  You're probably not going to saturate your network card no matter what your settings, Cassandra has way too much overhead to do that.  It's one of the reasons why the whole zero-copy streaming feature was added to Cassandra 4.0: http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_blog_2018_08_07_faster-5Fstreaming-5Fin-5Fcassandra.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=uyQyRQAH6rGAAtjwZF7Xzd0gwksPBtKKNFpzfyi9f2M&s=kCbODyLouPOI__Ku2DHXUXvBhw29wixkEsbXj8uwICk&e=>

Reid is also correct in pointing out the method by which you're monitoring your metrics might be problematic.  With prometheus, the same data can show significantly different graphs when using rate vs irate, and only collecting once a minute would hide a lot of useful data.

If you keep digging and find you're not using all your CPU during GC pauses, you can try using more GC threads by setting -XX:ParallelGCThreads to match the number of cores you have, since by default it won't use them all.  You've got 40 cores in the m4.10xlarge, try setting -XX:ParallelGCThreads to 40.
Jon

On Tue, Oct 22, 2019 at 11:38 AM Reid Pinchback <rp...@tripadvisor.com>> wrote:
Thomas, what is your frequency of metric collection?  If it is minute-level granularity, that can give a very false impression.  I’ve seen CPU and disk throttles that don’t even begin to show visibility until second-level granularity around the time of the constraining event.  Even clearer is 100ms.

Also, are you monitoring your GC activity at all?  GC bound up in a lot of memory copies is not going to manifest that much CPU, it’s memory bus bandwidth you are fighting against then.  It is easy to have a box that looks unused but in reality its struggling.  Given that you’ve opened up the floodgates on compaction, that would seem quite plausible to be what you are experiencing.

From: "Steinmaurer, Thomas" <th...@dynatrace.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, October 22, 2019 at 11:22 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
Hi Alex,

Increased streaming throughput has been set on the existing nodes only, cause it is meant to limit outgoing traffic only, right? At least when judging from the name, reading the documentation etc.

Increased compaction throughput on all nodes, although my understanding is that it would be necessary only on the joining node to catchup with compacting received SSTables.

We really see no resource (CPU, NW and disk) being somehow maxed out on any node, which would explain the limit in the area of the new node receiving data at ~ 180-200 Mbit/s.

Thanks again,
Thomas

From: Oleksandr Shulgin <ol...@zalando.de>>
Sent: Dienstag, 22. Oktober 2019 16:35
To: User <us...@cassandra.apache.org>>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas <th...@dynatrace.com>> wrote:

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both scenarios (throttling with high values vs. unthrottled), the 4th node is streaming from one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also confirmed by e.g. iperf in regard to NW throughput and write to / read from disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically outrule above settings?

Hi Thomas,

Assuming you have 3 Availability Zones and you are adding the new node to one of the zones where you already have a node running, it is expected that it only streams from that node (its local rack).

Have you increased the streaming throughput on the node it streams from or only on the new node?  The limit applies to the source node as well.  You can change it online w/o the need to restart using nodetool command.

Have you checked if the new node is not CPU-bound?  It's unlikely though due to big instance type and only one node to stream from, more relevant for scenarios when streaming from a lot of nodes.

Cheers,
--
Alex

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by Jon Haddad <jo...@jonhaddad.com>.

CPU waiting on memory will look like CPU overhead.   There's a good post on
the topic by Brendan Gregg:
http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html

Regarding GC, I agree with Reid.  You're probably not going to saturate
your network card no matter what your settings, Cassandra has way too much
overhead to do that.  It's one of the reasons why the whole zero-copy
streaming feature was added to Cassandra 4.0:
http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html

Reid is also correct in pointing out the method by which you're monitoring
your metrics might be problematic.  With prometheus, the same data can show
significantly different graphs when using rate vs irate, and only
collecting once a minute would hide a lot of useful data.

If you keep digging and find you're not using all your CPU during GC
pauses, you can try using more GC threads by setting -XX:ParallelGCThreads
to match the number of cores you have, since by default it won't use them
all.  You've got 40 cores in the m4.10xlarge, try
setting -XX:ParallelGCThreads to 40.

Jon



On Tue, Oct 22, 2019 at 11:38 AM Reid Pinchback <rp...@tripadvisor.com>
wrote:

> Thomas, what is your frequency of metric collection?  If it is
> minute-level granularity, that can give a very false impression.  I’ve seen
> CPU and disk throttles that don’t even begin to show visibility until
> second-level granularity around the time of the constraining event.  Even
> clearer is 100ms.
>
>
>
> Also, are you monitoring your GC activity at all?  GC bound up in a lot of
> memory copies is not going to manifest that much CPU, it’s memory bus
> bandwidth you are fighting against then.  It is easy to have a box that
> looks unused but in reality its struggling.  Given that you’ve opened up
> the floodgates on compaction, that would seem quite plausible to be what
> you are experiencing.
>
>
>
> *From: *"Steinmaurer, Thomas" <th...@dynatrace.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Tuesday, October 22, 2019 at 11:22 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput
>
>
>
> *Message from External Sender*
>
> Hi Alex,
>
>
>
> Increased streaming throughput has been set on the existing nodes only,
> cause it is meant to limit outgoing traffic only, right? At least when
> judging from the name, reading the documentation etc.
>
>
>
> Increased compaction throughput on all nodes, although my understanding is
> that it would be necessary only on the joining node to catchup with
> compacting received SSTables.
>
>
>
> We really see no resource (CPU, NW and disk) being somehow maxed out on
> any node, which would explain the limit in the area of the new node
> receiving data at ~ 180-200 Mbit/s.
>
>
>
> Thanks again,
>
> Thomas
>
>
>
> *From:* Oleksandr Shulgin <ol...@zalando.de>
> *Sent:* Dienstag, 22. Oktober 2019 16:35
> *To:* User <us...@cassandra.apache.org>
> *Subject:* Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput
>
>
>
> On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas <
> thomas.steinmaurer@dynatrace.com> wrote:
>
>
>
> using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we
> are trying to add a 4th node.
>
>
>
> The two options to my knowledge, mainly affecting throughput, namely
> stream output and compaction throttling has been set to very high values
> (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s)
> or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both
> scenarios (throttling with high values vs. unthrottled), the 4th node is
> streaming from one node capped ~ 180-200Mbit/s, according to our SFM.
>
>
>
> The nodes have plenty of resources available (10Gbit, disk io/iops), also
> confirmed by e.g. iperf in regard to NW throughput and write to / read from
> disk in the area of 200 MByte/s.
>
>
>
> Are there any other known throughput / bootstrap limitations, which
> basically outrule above settings?
>
>
>
> Hi Thomas,
>
>
>
> Assuming you have 3 Availability Zones and you are adding the new node to
> one of the zones where you already have a node running, it is expected that
> it only streams from that node (its local rack).
>
>
>
> Have you increased the streaming throughput on the node it streams from or
> only on the new node?  The limit applies to the source node as well.  You
> can change it online w/o the need to restart using nodetool command.
>
>
>
> Have you checked if the new node is not CPU-bound?  It's unlikely though
> due to big instance type and only one node to stream from, more relevant
> for scenarios when streaming from a lot of nodes.
>
>
>
> Cheers,
>
> --
>
> Alex
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>

Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by Reid Pinchback <rp...@tripadvisor.com>.

Thomas, what is your frequency of metric collection?  If it is minute-level granularity, that can give a very false impression.  I’ve seen CPU and disk throttles that don’t even begin to show visibility until second-level granularity around the time of the constraining event.  Even clearer is 100ms.

Also, are you monitoring your GC activity at all?  GC bound up in a lot of memory copies is not going to manifest that much CPU, it’s memory bus bandwidth you are fighting against then.  It is easy to have a box that looks unused but in reality its struggling.  Given that you’ve opened up the floodgates on compaction, that would seem quite plausible to be what you are experiencing.

From: "Steinmaurer, Thomas" <th...@dynatrace.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, October 22, 2019 at 11:22 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
Hi Alex,

Increased streaming throughput has been set on the existing nodes only, cause it is meant to limit outgoing traffic only, right? At least when judging from the name, reading the documentation etc.

Increased compaction throughput on all nodes, although my understanding is that it would be necessary only on the joining node to catchup with compacting received SSTables.

We really see no resource (CPU, NW and disk) being somehow maxed out on any node, which would explain the limit in the area of the new node receiving data at ~ 180-200 Mbit/s.

Thanks again,
Thomas

From: Oleksandr Shulgin <ol...@zalando.de>
Sent: Dienstag, 22. Oktober 2019 16:35
To: User <us...@cassandra.apache.org>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas <th...@dynatrace.com>> wrote:

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both scenarios (throttling with high values vs. unthrottled), the 4th node is streaming from one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also confirmed by e.g. iperf in regard to NW throughput and write to / read from disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically outrule above settings?

Hi Thomas,

Assuming you have 3 Availability Zones and you are adding the new node to one of the zones where you already have a node running, it is expected that it only streams from that node (its local rack).

Have you increased the streaming throughput on the node it streams from or only on the new node?  The limit applies to the source node as well.  You can change it online w/o the need to restart using nodetool command.

Have you checked if the new node is not CPU-bound?  It's unlikely though due to big instance type and only one node to stream from, more relevant for scenarios when streaming from a lot of nodes.

Cheers,
--
Alex

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by "Steinmaurer, Thomas" <th...@dynatrace.com>.

Hi Alex,

Increased streaming throughput has been set on the existing nodes only, cause it is meant to limit outgoing traffic only, right? At least when judging from the name, reading the documentation etc.

Increased compaction throughput on all nodes, although my understanding is that it would be necessary only on the joining node to catchup with compacting received SSTables.

We really see no resource (CPU, NW and disk) being somehow maxed out on any node, which would explain the limit in the area of the new node receiving data at ~ 180-200 Mbit/s.

Thanks again,
Thomas

From: Oleksandr Shulgin <ol...@zalando.de>
Sent: Dienstag, 22. Oktober 2019 16:35
To: User <us...@cassandra.apache.org>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas <th...@dynatrace.com>> wrote:

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both scenarios (throttling with high values vs. unthrottled), the 4th node is streaming from one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also confirmed by e.g. iperf in regard to NW throughput and write to / read from disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically outrule above settings?

Hi Thomas,

Assuming you have 3 Availability Zones and you are adding the new node to one of the zones where you already have a node running, it is expected that it only streams from that node (its local rack).

Have you increased the streaming throughput on the node it streams from or only on the new node?  The limit applies to the source node as well.  You can change it online w/o the need to restart using nodetool command.

Have you checked if the new node is not CPU-bound?  It's unlikely though due to big instance type and only one node to stream from, more relevant for scenarios when streaming from a lot of nodes.

Cheers,
--
Alex

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas <
thomas.steinmaurer@dynatrace.com> wrote:

>
>
> using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we
> are trying to add a 4th node.
>
>
>
> The two options to my knowledge, mainly affecting throughput, namely
> stream output and compaction throttling has been set to very high values
> (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s)
> or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both
> scenarios (throttling with high values vs. unthrottled), the 4th node is
> streaming from one node capped ~ 180-200Mbit/s, according to our SFM.
>
>
>
> The nodes have plenty of resources available (10Gbit, disk io/iops), also
> confirmed by e.g. iperf in regard to NW throughput and write to / read from
> disk in the area of 200 MByte/s.
>
>
>
> Are there any other known throughput / bootstrap limitations, which
> basically outrule above settings?
>

Hi Thomas,

Assuming you have 3 Availability Zones and you are adding the new node to
one of the zones where you already have a node running, it is expected that
it only streams from that node (its local rack).

Have you increased the streaming throughput on the node it streams from or
only on the new node?  The limit applies to the source node as well.  You
can change it online w/o the need to restart using nodetool command.

Have you checked if the new node is not CPU-bound?  It's unlikely though
due to big instance type and only one node to stream from, more relevant
for scenarios when streaming from a lot of nodes.

Cheers,
--
Alex

Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Posted by Reid Pinchback <rp...@tripadvisor.com>.

A high level of compaction seems highly likely to throttle you by sending the service into a GC death spiral, doubly-so if any repairs happen to be underway at the same time (I may or may not have killed a few nodes this way, but I admit nothing!).  Even if not in GC hell, it can cause you to episodically blast out writes that rapidly dirty a lot of pages, thus triggering a fill of the disk io queue that then starves out read requests from the disk.  More != Better when it comes to compaction.  You want as little compaction as your usage pattern requires of you.  Smoothness of its contribution to the overall load is a better objective.

Jon Haddad did a datastax conference talk this year on some easy tunings that you’ll likely want to listen to. You’ll probably end up rethinking your vnode count as well. Also note that a fast disk can spend a lot of its time doing the wrong things. His talk covers some of the factors in that.

https://www.youtube.com/watch?v=swL7bCnolkU

From: "Steinmaurer, Thomas" <th...@dynatrace.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, October 22, 2019 at 6:47 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
Hello,

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 (unthrottled) in cassandra.yaml + process restart. In both scenarios (throttling with high values vs. unthrottled), the 4th node is streaming from one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also confirmed by e.g. iperf in regard to NW throughput and write to / read from disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically outrule above settings?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313