You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vasileios Vlachos <va...@gmail.com> on 2016/09/12 14:38:38 UTC

Streaming Process: How can we speed it up?

Hello,

We use cassandra 2.0.17 at the moment and we are rebuilding our nodes; this
involves taking one node down at a time and bringing the new node up with
JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node" in
cassandra-env.sh. In order to increase the streaming times we doubled
stream_throughput_outbound_megabits_per_sec from 200 to 400 on all nodes in
the cluster.

The problem is that streaming takes a long time to complete. On Friday I
asked the IRC channel and jeffj provided some feedback, but I saw his
responses hours later. I have included some graphs at the bottom of this
email which show CPU performance and network utilisation on the cluster
during the streaming process. Basically, jeffj's suspicion was that we are
CPU-bound on the receiving node. The graphs show that CPU utilisation is
not high enough for us to conclude that CPU is our bottleneck; unless
during streaming, Cassandra uses one core per connection/node. Does anyone
know if that's the case?

INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 87)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Executing streaming plan for
Bootstrap
INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.3.5.2
INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.1.5.1
INFO [StreamConnectionEstablisher:1] 2016-09-12 12:34:19,801
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.2
INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.3.5.3
INFO [main] 2016-09-12 12:34:19,806 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.1.5.2
INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,806
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.3
INFO [StreamConnectionEstablisher:2] 2016-09-12 12:34:19,802
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.1
INFO [main] 2016-09-12 12:34:19,809 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.1.5.3
INFO [StreamConnectionEstablisher:4] 2016-09-12 12:34:19,809
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.2
INFO [main] 2016-09-12 12:34:19,811 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.1.5.4
INFO [StreamConnectionEstablisher:5] 2016-09-12 12:34:19,811
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.3
INFO [main] 2016-09-12 12:34:19,815 StreamResultFuture.java (line 91)
[Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
with /10.3.5.4
INFO [StreamConnectionEstablisher:6] 2016-09-12 12:34:19,818
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.4
INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,824
StreamSession.java (line 214) [Stream
#d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.4
INFO [STREAM-IN-/10.3.5.4] 2016-09-12 12:34:19,846 StreamResultFuture.java
(line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /
10.3.5.4 is complete
INFO [STREAM-IN-/10.1.5.1] 2016-09-12 12:34:19,875 StreamResultFuture.java
(line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /
10.1.5.1 is complete
INFO [STREAM-IN-/10.1.5.2] 2016-09-12 12:34:19,897 StreamResultFuture.java
(line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /
10.1.5.2 is complete
INFO [STREAM-IN-/10.1.5.3] 2016-09-12 12:34:19,898 StreamResultFuture.java
(line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /
10.1.5.3 is complete
INFO [STREAM-IN-/10.1.5.4] 2016-09-12 12:34:19,901 StreamResultFuture.java
(line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /
10.1.5.4 is complete

The above output is from system.log during initiation of the streaming
process on one of the new nodes. The 10.1.X.X nodes are located in a
different DC. I understand why these nodes are not used for streaming,
however, I do not understand why 10.3.5.4 is not streaming data to
10.3.5.1. Any ideas why would this happen?

Looking at cassandra004's network utilisation graph, we can see that the
node was streaming at 20MBps initially, then at 10MBps when only one node
was sending data to it. We seem to only be able to receive data at
10MBps/Tx node. Could we do something in order to be able to stream from
more nodes and/or increase the streaming speed?

The graphs:

[image: Inline image 15][image: Inline image 14][image: Inline image 16][image:
Inline image 9][image: Inline image 13][image: Inline image 10][image:
Inline image 12][image: Inline image 11]

Many Thanks,
Vasilis

P.S.

Thanks to jeffj for his help on IRC!

Re: Streaming Process: How can we speed it up?

Posted by Jens Rantil <je...@tink.se>.
Hi Vasileios,

> unless inter-DC streaming does not suffer from intra-DC streaming
slowness. Is this correct?

The per-node streaming will be the same, however when you set up a new DC
you can stream all your data concurrently instead of one node at a time
which is a huge improvement.

Also, for the record, I've also noticed that multiple nodes aren't
streaming to at the same time when bringing up a new node.

Cheers,
Jens

On Mon, Sep 19, 2016 at 4:23 AM Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

> Hello Jens and thanks for your reply,
>
> True; we could add another DC if we had enough resources, but
> unfortunately that's not an option.
>
> Even if there were no restrictions and adding another DC would be one of
> the options, I would still be concerned if the migration was taking so long
> to finish; unless inter-DC streaming does not suffer from intra-DC
> streaming slowness. Is this correct?
>
> Many Thanks,
> Vasilis
>
> On Fri, Sep 16, 2016 at 8:47 AM, Jens Rantil <je...@tink.se> wrote:
>
>> Hi Vasilis,
>>
>> Have you considered setting up a new DC[1], migrating over your clients
>> and decommissioning the old cluster instead? Some advantages:
>>
>>    - It involves less hackery and workarounds. It makes mistakes less
>>    likely.
>>    - You can stream all data to the new DC concurrently all nodes at the
>>    same time. This is instead of doing a single node at a time like you are
>>    doing.
>>    - You have more of a point-in-time migration from old DC to new. You
>>    can easily migrate back to the old DC in case something goes wrong.
>>
>> AFAIK, the reasons you can't do above is if you don't have enough
>> hardware, or not enough IP addresses. Otherwise, I'd say the above process
>> is somewhat of a best practise.
>>
>> [1]
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>
>> Cheers,
>> Jens
>>
>> On Mon, Sep 12, 2016 at 4:39 PM Vasileios Vlachos <
>> vasileiosvlachos@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> We use cassandra 2.0.17 at the moment and we are rebuilding our nodes;
>>> this involves taking one node down at a time and bringing the new node up
>>> with JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node"
>>> in cassandra-env.sh. In order to increase the streaming times we doubled
>>> stream_throughput_outbound_megabits_per_sec from 200 to 400 on all nodes in
>>> the cluster.
>>>
>>> The problem is that streaming takes a long time to complete. On Friday I
>>> asked the IRC channel and jeffj provided some feedback, but I saw his
>>> responses hours later. I have included some graphs at the bottom of this
>>> email which show CPU performance and network utilisation on the cluster
>>> during the streaming process. Basically, jeffj's suspicion was that we are
>>> CPU-bound on the receiving node. The graphs show that CPU utilisation is
>>> not high enough for us to conclude that CPU is our bottleneck; unless
>>> during streaming, Cassandra uses one core per connection/node. Does anyone
>>> know if that's the case?
>>>
>>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 87)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Executing streaming plan for
>>> Bootstrap
>>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.3.5.2
>>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.1
>>> INFO [StreamConnectionEstablisher:1] 2016-09-12 12:34:19,801
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.2
>>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.3.5.3
>>> INFO [main] 2016-09-12 12:34:19,806 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.2
>>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,806
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.3
>>> INFO [StreamConnectionEstablisher:2] 2016-09-12 12:34:19,802
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.1
>>> INFO [main] 2016-09-12 12:34:19,809 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.3
>>> INFO [StreamConnectionEstablisher:4] 2016-09-12 12:34:19,809
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.2
>>> INFO [main] 2016-09-12 12:34:19,811 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.4
>>> INFO [StreamConnectionEstablisher:5] 2016-09-12 12:34:19,811
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.3
>>> INFO [main] 2016-09-12 12:34:19,815 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.3.5.4
>>> INFO [StreamConnectionEstablisher:6] 2016-09-12 12:34:19,818
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.4
>>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,824
>>> StreamSession.java (line 214) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.4
>>> INFO [STREAM-IN-/10.3.5.4] 2016-09-12 12:34:19,846
>>> StreamResultFuture.java (line 186) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.3.5.4 is
>>> complete
>>> INFO [STREAM-IN-/10.1.5.1] 2016-09-12 12:34:19,875
>>> StreamResultFuture.java (line 186) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.1 is
>>> complete
>>> INFO [STREAM-IN-/10.1.5.2] 2016-09-12 12:34:19,897
>>> StreamResultFuture.java (line 186) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.2 is
>>> complete
>>> INFO [STREAM-IN-/10.1.5.3] 2016-09-12 12:34:19,898
>>> StreamResultFuture.java (line 186) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.3 is
>>> complete
>>> INFO [STREAM-IN-/10.1.5.4] 2016-09-12 12:34:19,901
>>> StreamResultFuture.java (line 186) [Stream
>>> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.4 is
>>> complete
>>>
>>> The above output is from system.log during initiation of the streaming
>>> process on one of the new nodes. The 10.1.X.X nodes are located in a
>>> different DC. I understand why these nodes are not used for streaming,
>>> however, I do not understand why 10.3.5.4 is not streaming data to
>>> 10.3.5.1. Any ideas why would this happen?
>>>
>>> Looking at cassandra004's network utilisation graph, we can see that the
>>> node was streaming at 20MBps initially, then at 10MBps when only one node
>>> was sending data to it. We seem to only be able to receive data at
>>> 10MBps/Tx node. Could we do something in order to be able to stream from
>>> more nodes and/or increase the streaming speed?
>>>
>>> The graphs:
>>>
>>> [image: cassandra001_CPU.png][image: cassandra001_network.png][image:
>>> cassandra002_CPU.png][image: cassandra002_network.png][image:
>>> cassandra003_CPU.png][image: cassandra003_network.png][image:
>>> cassandra004_CPU.png][image: cassandra004_network.png]
>>>
>>> Many Thanks,
>>> Vasilis
>>>
>>> P.S.
>>>
>>> Thanks to jeffj for his help on IRC!
>>>
>> --
>>
>> Jens Rantil
>> Backend Developer @ Tink
>>
>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>> For urgent matters you can reach me at +46-708-84 18 32.
>>
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Streaming Process: How can we speed it up?

Posted by Vasileios Vlachos <va...@gmail.com>.
Hello Jens and thanks for your reply,

True; we could add another DC if we had enough resources, but unfortunately
that's not an option.

Even if there were no restrictions and adding another DC would be one of
the options, I would still be concerned if the migration was taking so long
to finish; unless inter-DC streaming does not suffer from intra-DC
streaming slowness. Is this correct?

Many Thanks,
Vasilis

On Fri, Sep 16, 2016 at 8:47 AM, Jens Rantil <je...@tink.se> wrote:

> Hi Vasilis,
>
> Have you considered setting up a new DC[1], migrating over your clients
> and decommissioning the old cluster instead? Some advantages:
>
>    - It involves less hackery and workarounds. It makes mistakes less
>    likely.
>    - You can stream all data to the new DC concurrently all nodes at the
>    same time. This is instead of doing a single node at a time like you are
>    doing.
>    - You have more of a point-in-time migration from old DC to new. You
>    can easily migrate back to the old DC in case something goes wrong.
>
> AFAIK, the reasons you can't do above is if you don't have enough
> hardware, or not enough IP addresses. Otherwise, I'd say the above process
> is somewhat of a best practise.
>
> [1] https://docs.datastax.com/en/cassandra/2.0/cassandra/
> operations/ops_add_dc_to_cluster_t.html
>
> Cheers,
> Jens
>
> On Mon, Sep 12, 2016 at 4:39 PM Vasileios Vlachos <
> vasileiosvlachos@gmail.com> wrote:
>
>> Hello,
>>
>> We use cassandra 2.0.17 at the moment and we are rebuilding our nodes;
>> this involves taking one node down at a time and bringing the new node up
>> with JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node"
>> in cassandra-env.sh. In order to increase the streaming times we doubled
>> stream_throughput_outbound_megabits_per_sec from 200 to 400 on all nodes
>> in the cluster.
>>
>> The problem is that streaming takes a long time to complete. On Friday I
>> asked the IRC channel and jeffj provided some feedback, but I saw his
>> responses hours later. I have included some graphs at the bottom of this
>> email which show CPU performance and network utilisation on the cluster
>> during the streaming process. Basically, jeffj's suspicion was that we are
>> CPU-bound on the receiving node. The graphs show that CPU utilisation is
>> not high enough for us to conclude that CPU is our bottleneck; unless
>> during streaming, Cassandra uses one core per connection/node. Does anyone
>> know if that's the case?
>>
>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 87)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Executing streaming plan
>> for Bootstrap
>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.3.5.2
>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.1
>> INFO [StreamConnectionEstablisher:1] 2016-09-12 12:34:19,801
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.3.5.2
>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.3.5.3
>> INFO [main] 2016-09-12 12:34:19,806 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.2
>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,806
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.3.5.3
>> INFO [StreamConnectionEstablisher:2] 2016-09-12 12:34:19,802
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.1
>> INFO [main] 2016-09-12 12:34:19,809 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.3
>> INFO [StreamConnectionEstablisher:4] 2016-09-12 12:34:19,809
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.2
>> INFO [main] 2016-09-12 12:34:19,811 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.4
>> INFO [StreamConnectionEstablisher:5] 2016-09-12 12:34:19,811
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.3
>> INFO [main] 2016-09-12 12:34:19,815 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.3.5.4
>> INFO [StreamConnectionEstablisher:6] 2016-09-12 12:34:19,818
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.4
>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,824
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.3.5.4
>> INFO [STREAM-IN-/10.3.5.4] 2016-09-12 12:34:19,846
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.3.5.4 is complete
>> INFO [STREAM-IN-/10.1.5.1] 2016-09-12 12:34:19,875
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.1 is complete
>> INFO [STREAM-IN-/10.1.5.2] 2016-09-12 12:34:19,897
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.2 is complete
>> INFO [STREAM-IN-/10.1.5.3] 2016-09-12 12:34:19,898
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.3 is complete
>> INFO [STREAM-IN-/10.1.5.4] 2016-09-12 12:34:19,901
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.4 is complete
>>
>> The above output is from system.log during initiation of the streaming
>> process on one of the new nodes. The 10.1.X.X nodes are located in a
>> different DC. I understand why these nodes are not used for streaming,
>> however, I do not understand why 10.3.5.4 is not streaming data to
>> 10.3.5.1. Any ideas why would this happen?
>>
>> Looking at cassandra004's network utilisation graph, we can see that the
>> node was streaming at 20MBps initially, then at 10MBps when only one node
>> was sending data to it. We seem to only be able to receive data at
>> 10MBps/Tx node. Could we do something in order to be able to stream from
>> more nodes and/or increase the streaming speed?
>>
>> The graphs:
>>
>> [image: cassandra001_CPU.png][image: cassandra001_network.png][image:
>> cassandra002_CPU.png][image: cassandra002_network.png][image:
>> cassandra003_CPU.png][image: cassandra003_network.png][image:
>> cassandra004_CPU.png][image: cassandra004_network.png]
>>
>> Many Thanks,
>> Vasilis
>>
>> P.S.
>>
>> Thanks to jeffj for his help on IRC!
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>

Re: Streaming Process: How can we speed it up?

Posted by Jens Rantil <je...@tink.se>.
Hi Vasilis,

Have you considered setting up a new DC[1], migrating over your clients and
decommissioning the old cluster instead? Some advantages:

   - It involves less hackery and workarounds. It makes mistakes less
   likely.
   - You can stream all data to the new DC concurrently all nodes at the
   same time. This is instead of doing a single node at a time like you are
   doing.
   - You have more of a point-in-time migration from old DC to new. You can
   easily migrate back to the old DC in case something goes wrong.

AFAIK, the reasons you can't do above is if you don't have enough hardware,
or not enough IP addresses. Otherwise, I'd say the above process is
somewhat of a best practise.

[1]
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

Cheers,
Jens

On Mon, Sep 12, 2016 at 4:39 PM Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

> Hello,
>
> We use cassandra 2.0.17 at the moment and we are rebuilding our nodes;
> this involves taking one node down at a time and bringing the new node up
> with JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node"
> in cassandra-env.sh. In order to increase the streaming times we doubled
> stream_throughput_outbound_megabits_per_sec from 200 to 400 on all nodes in
> the cluster.
>
> The problem is that streaming takes a long time to complete. On Friday I
> asked the IRC channel and jeffj provided some feedback, but I saw his
> responses hours later. I have included some graphs at the bottom of this
> email which show CPU performance and network utilisation on the cluster
> during the streaming process. Basically, jeffj's suspicion was that we are
> CPU-bound on the receiving node. The graphs show that CPU utilisation is
> not high enough for us to conclude that CPU is our bottleneck; unless
> during streaming, Cassandra uses one core per connection/node. Does anyone
> know if that's the case?
>
> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 87)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Executing streaming plan for
> Bootstrap
> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.3.5.2
> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.1.5.1
> INFO [StreamConnectionEstablisher:1] 2016-09-12 12:34:19,801
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.2
> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.3.5.3
> INFO [main] 2016-09-12 12:34:19,806 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.1.5.2
> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,806
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.3
> INFO [StreamConnectionEstablisher:2] 2016-09-12 12:34:19,802
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.1
> INFO [main] 2016-09-12 12:34:19,809 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.1.5.3
> INFO [StreamConnectionEstablisher:4] 2016-09-12 12:34:19,809
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.2
> INFO [main] 2016-09-12 12:34:19,811 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.1.5.4
> INFO [StreamConnectionEstablisher:5] 2016-09-12 12:34:19,811
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.3
> INFO [main] 2016-09-12 12:34:19,815 StreamResultFuture.java (line 91)
> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
> with /10.3.5.4
> INFO [StreamConnectionEstablisher:6] 2016-09-12 12:34:19,818
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.1.5.4
> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,824
> StreamSession.java (line 214) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Starting streaming to /10.3.5.4
> INFO [STREAM-IN-/10.3.5.4] 2016-09-12 12:34:19,846
> StreamResultFuture.java (line 186) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.3.5.4 is complete
>
> INFO [STREAM-IN-/10.1.5.1] 2016-09-12 12:34:19,875
> StreamResultFuture.java (line 186) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.1 is complete
>
> INFO [STREAM-IN-/10.1.5.2] 2016-09-12 12:34:19,897
> StreamResultFuture.java (line 186) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.2 is complete
>
> INFO [STREAM-IN-/10.1.5.3] 2016-09-12 12:34:19,898
> StreamResultFuture.java (line 186) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.3 is complete
>
> INFO [STREAM-IN-/10.1.5.4] 2016-09-12 12:34:19,901
> StreamResultFuture.java (line 186) [Stream
> #d5708c40-78dc-11e6-b7ea-857314f4c01e] Session with /10.1.5.4 is complete
>
> The above output is from system.log during initiation of the streaming
> process on one of the new nodes. The 10.1.X.X nodes are located in a
> different DC. I understand why these nodes are not used for streaming,
> however, I do not understand why 10.3.5.4 is not streaming data to
> 10.3.5.1. Any ideas why would this happen?
>
> Looking at cassandra004's network utilisation graph, we can see that the
> node was streaming at 20MBps initially, then at 10MBps when only one node
> was sending data to it. We seem to only be able to receive data at
> 10MBps/Tx node. Could we do something in order to be able to stream from
> more nodes and/or increase the streaming speed?
>
> The graphs:
>
> [image: cassandra001_CPU.png][image: cassandra001_network.png][image:
> cassandra002_CPU.png][image: cassandra002_network.png][image:
> cassandra003_CPU.png][image: cassandra003_network.png][image:
> cassandra004_CPU.png][image: cassandra004_network.png]
>
> Many Thanks,
> Vasilis
>
> P.S.
>
> Thanks to jeffj for his help on IRC!
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Streaming Process: How can we speed it up?

Posted by Vasileios Vlachos <va...@gmail.com>.
Thanks for sharing your experience Ben

On 15 Sep 2016 11:35 am, "Ben Slater" <be...@instaclustr.com> wrote:

> We’ve successfully used the rsynch method you outline quite a few times in
> situations where we’ve had clusters that take forever to add new nodes
> (mainly due to secondary indexes) and need to do a quick replacement for
> one reason or another. As you mention, the main disadvantage we ran into is
> that the node doesn’t get cleaned up through the replacement process like a
> newly streamed node does (plus the extra operational complexity).
>
> Cheers
> Ben
>
> On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos <va...@gmail.com>
> wrote:
>
>> Hello and thanks for your responses,
>>
>> OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
>> difference. Any ideas why streaming is limited to only two of the three
>> nodes available?
>>
>> As an alternative to slow streaming I tried this:
>>
>>   - install C* on a new node, stop the service and delete
>> /var/lib/cassandra/*
>>  - rsync /etc/cassandra from old node to new node
>>  - rsync /var/lib/cassandra from old node to new node
>>  - stop C* on the old node
>>  - rsync /var/lib/cassandra from old node to new node
>>  - move the old node to a different IP
>>  - move the new node to the old node's original IP
>>  - start C* on the new node (no need for the replace_node option in
>> cassandra-env.sh)
>>
>> This technique has been successful so far for a demo cluster with fewer
>> data. The only disadvantage for us is that we were hoping that by streaming
>> the SSTables to the new node, tombstones would be discarded (freeing a lot
>> of disk space on our live cluster). This is exactly what happened for the
>> one node we streamed so far; unfortunately, the slow streaming generates a
>> lot of hints which makes recovery a very long process.
>>
>> Do you guys see any other problems with the rsync method that I've
>> skipped?
>>
>> Regarding the tombstones issue (if we finally do what I described above),
>> I'm thinking sstablsplit. Then compaction should deal with it (I think). I
>> have not used sstablesplit in the past, so another thing I'd like to ask is
>> if you guys find this a good/bad idea for what I'm trying to do.
>>
>> Many thanks,
>> Vasilis
>>
>> On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa <jj...@apache.org> wrote:
>>
>>>
>>>
>>> On 2016-09-12 09:38 (-0700), daemeon reiydelle <da...@gmail.com>
>>> wrote:
>>> > Re. throughput. That looks slow for jumbo with 10g. Check your
>>> networks.
>>> >
>>> >
>>>
>>> It's extremely unlikely you'll be able to saturate a 10g link with a
>>> single instance cassandra.
>>>
>>> Faster Cassandra streaming is a work in progress - being able to send
>>> more than one file at a time is probably the most obvious area for
>>> improvement, and being able to better deal with the CPU / garbage generated
>>> on the receiving side is just behind that. You'll likely be able to stream
>>> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode
>>> setup, you'll be cpu bound - in a single-token setup, you'll be stream
>>> bound).
>>>
>>>
>>>
>> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>

Re: Streaming Process: How can we speed it up?

Posted by Ben Slater <be...@instaclustr.com>.
We’ve successfully used the rsynch method you outline quite a few times in
situations where we’ve had clusters that take forever to add new nodes
(mainly due to secondary indexes) and need to do a quick replacement for
one reason or another. As you mention, the main disadvantage we ran into is
that the node doesn’t get cleaned up through the replacement process like a
newly streamed node does (plus the extra operational complexity).

Cheers
Ben

On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos <va...@gmail.com>
wrote:

> Hello and thanks for your responses,
>
> OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
> difference. Any ideas why streaming is limited to only two of the three
> nodes available?
>
> As an alternative to slow streaming I tried this:
>
>   - install C* on a new node, stop the service and delete
> /var/lib/cassandra/*
>  - rsync /etc/cassandra from old node to new node
>  - rsync /var/lib/cassandra from old node to new node
>  - stop C* on the old node
>  - rsync /var/lib/cassandra from old node to new node
>  - move the old node to a different IP
>  - move the new node to the old node's original IP
>  - start C* on the new node (no need for the replace_node option in
> cassandra-env.sh)
>
> This technique has been successful so far for a demo cluster with fewer
> data. The only disadvantage for us is that we were hoping that by streaming
> the SSTables to the new node, tombstones would be discarded (freeing a lot
> of disk space on our live cluster). This is exactly what happened for the
> one node we streamed so far; unfortunately, the slow streaming generates a
> lot of hints which makes recovery a very long process.
>
> Do you guys see any other problems with the rsync method that I've skipped?
>
> Regarding the tombstones issue (if we finally do what I described above),
> I'm thinking sstablsplit. Then compaction should deal with it (I think). I
> have not used sstablesplit in the past, so another thing I'd like to ask is
> if you guys find this a good/bad idea for what I'm trying to do.
>
> Many thanks,
> Vasilis
>
> On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa <jj...@apache.org> wrote:
>
>>
>>
>> On 2016-09-12 09:38 (-0700), daemeon reiydelle <da...@gmail.com>
>> wrote:
>> > Re. throughput. That looks slow for jumbo with 10g. Check your networks.
>> >
>> >
>>
>> It's extremely unlikely you'll be able to saturate a 10g link with a
>> single instance cassandra.
>>
>> Faster Cassandra streaming is a work in progress - being able to send
>> more than one file at a time is probably the most obvious area for
>> improvement, and being able to better deal with the CPU / garbage generated
>> on the receiving side is just behind that. You'll likely be able to stream
>> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode
>> setup, you'll be cpu bound - in a single-token setup, you'll be stream
>> bound).
>>
>>
>>
> --
————————
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

Re: Streaming Process: How can we speed it up?

Posted by Vasileios Vlachos <va...@gmail.com>.
Hello and thanks for your responses,

OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
difference. Any ideas why streaming is limited to only two of the three
nodes available?

As an alternative to slow streaming I tried this:

  - install C* on a new node, stop the service and delete
/var/lib/cassandra/*
 - rsync /etc/cassandra from old node to new node
 - rsync /var/lib/cassandra from old node to new node
 - stop C* on the old node
 - rsync /var/lib/cassandra from old node to new node
 - move the old node to a different IP
 - move the new node to the old node's original IP
 - start C* on the new node (no need for the replace_node option in
cassandra-env.sh)

This technique has been successful so far for a demo cluster with fewer
data. The only disadvantage for us is that we were hoping that by streaming
the SSTables to the new node, tombstones would be discarded (freeing a lot
of disk space on our live cluster). This is exactly what happened for the
one node we streamed so far; unfortunately, the slow streaming generates a
lot of hints which makes recovery a very long process.

Do you guys see any other problems with the rsync method that I've skipped?

Regarding the tombstones issue (if we finally do what I described above),
I'm thinking sstablsplit. Then compaction should deal with it (I think). I
have not used sstablesplit in the past, so another thing I'd like to ask is
if you guys find this a good/bad idea for what I'm trying to do.

Many thanks,
Vasilis

On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa <jj...@apache.org> wrote:

>
>
> On 2016-09-12 09:38 (-0700), daemeon reiydelle <da...@gmail.com> wrote:
> > Re. throughput. That looks slow for jumbo with 10g. Check your networks.
> >
> >
>
> It's extremely unlikely you'll be able to saturate a 10g link with a
> single instance cassandra.
>
> Faster Cassandra streaming is a work in progress - being able to send more
> than one file at a time is probably the most obvious area for improvement,
> and being able to better deal with the CPU / garbage generated on the
> receiving side is just behind that. You'll likely be able to stream 10-15
> MB/s per sending server or cpu core, whichever is less (in a vnode setup,
> you'll be cpu bound - in a single-token setup, you'll be stream bound).
>
>
>

Re: Streaming Process: How can we speed it up?

Posted by Jeff Jirsa <jj...@apache.org>.

On 2016-09-12 09:38 (-0700), daemeon reiydelle <da...@gmail.com> wrote: 
> Re. throughput. That looks slow for jumbo with 10g. Check your networks.
> 
> 

It's extremely unlikely you'll be able to saturate a 10g link with a single instance cassandra.

Faster Cassandra streaming is a work in progress - being able to send more than one file at a time is probably the most obvious area for improvement, and being able to better deal with the CPU / garbage generated on the receiving side is just behind that. You'll likely be able to stream 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode setup, you'll be cpu bound - in a single-token setup, you'll be stream bound).



Re: Streaming Process: How can we speed it up?

Posted by daemeon reiydelle <da...@gmail.com>.
Re. throughput. That looks slow for jumbo with 10g. Check your networks.


*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Sep 12, 2016 at 8:57 AM, Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

> Hello,
>
> We use Nagios + NRPE, PNP4Nagios and a few templates in order to plot
> correlating counters on the same graph when needed. For the majority of our
> Cassandra-specific checks, we use the JMX console on each node.
>
> On Mon, Sep 12, 2016 at 3:59 PM, Nagh <na...@gmail.com> wrote:
>
>> Hi Vasilis,
>>                     My name is Nagaraj.I'm building a new Cassandra
>> cluster in our organization.We are going to use Apache Cassandra 3.0.8.I've
>> have seen your attachments for the monitoring Cassandra.I just want to know
>> which Monitoring tool you are using for Cassandra Metrics and alerts.Do you
>> suggest anything to me.Appreciate your help on this.
>>
>> On Mon, Sep 12, 2016 at 10:38 AM, Vasileios Vlachos <
>> vasileiosvlachos@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> We use cassandra 2.0.17 at the moment and we are rebuilding our nodes;
>>> this involves taking one node down at a time and bringing the new node up
>>> with JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node"
>>> in cassandra-env.sh. In order to increase the streaming times we doubled
>>> stream_throughput_outbound_megabits_per_sec from 200 to 400 on all
>>> nodes in the cluster.
>>>
>>> The problem is that streaming takes a long time to complete. On Friday I
>>> asked the IRC channel and jeffj provided some feedback, but I saw his
>>> responses hours later. I have included some graphs at the bottom of this
>>> email which show CPU performance and network utilisation on the cluster
>>> during the streaming process. Basically, jeffj's suspicion was that we are
>>> CPU-bound on the receiving node. The graphs show that CPU utilisation is
>>> not high enough for us to conclude that CPU is our bottleneck; unless
>>> during streaming, Cassandra uses one core per connection/node. Does anyone
>>> know if that's the case?
>>>
>>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 87)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Executing streaming plan
>>> for Bootstrap
>>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.3.5.2
>>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.1
>>> INFO [StreamConnectionEstablisher:1] 2016-09-12 12:34:19,801
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.3.5.2
>>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.3.5.3
>>> INFO [main] 2016-09-12 12:34:19,806 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.2
>>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,806
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.3.5.3
>>> INFO [StreamConnectionEstablisher:2] 2016-09-12 12:34:19,802
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.1.5.1
>>> INFO [main] 2016-09-12 12:34:19,809 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.3
>>> INFO [StreamConnectionEstablisher:4] 2016-09-12 12:34:19,809
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.1.5.2
>>> INFO [main] 2016-09-12 12:34:19,811 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.1.5.4
>>> INFO [StreamConnectionEstablisher:5] 2016-09-12 12:34:19,811
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.1.5.3
>>> INFO [main] 2016-09-12 12:34:19,815 StreamResultFuture.java (line 91)
>>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>>> with /10.3.5.4
>>> INFO [StreamConnectionEstablisher:6] 2016-09-12 12:34:19,818
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.1.5.4
>>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,824
>>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Starting streaming to /10.3.5.4
>>> INFO [STREAM-IN-/10.3.5.4] 2016-09-12 12:34:19,846
>>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Session with /10.3.5.4 is complete
>>> INFO [STREAM-IN-/10.1.5.1] 2016-09-12 12:34:19,875
>>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Session with /10.1.5.1 is complete
>>> INFO [STREAM-IN-/10.1.5.2] 2016-09-12 12:34:19,897
>>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Session with /10.1.5.2 is complete
>>> INFO [STREAM-IN-/10.1.5.3] 2016-09-12 12:34:19,898
>>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Session with /10.1.5.3 is complete
>>> INFO [STREAM-IN-/10.1.5.4] 2016-09-12 12:34:19,901
>>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>>> Session with /10.1.5.4 is complete
>>>
>>> The above output is from system.log during initiation of the streaming
>>> process on one of the new nodes. The 10.1.X.X nodes are located in a
>>> different DC. I understand why these nodes are not used for streaming,
>>> however, I do not understand why 10.3.5.4 is not streaming data to
>>> 10.3.5.1. Any ideas why would this happen?
>>>
>>> Looking at cassandra004's network utilisation graph, we can see that the
>>> node was streaming at 20MBps initially, then at 10MBps when only one node
>>> was sending data to it. We seem to only be able to receive data at
>>> 10MBps/Tx node. Could we do something in order to be able to stream from
>>> more nodes and/or increase the streaming speed?
>>>
>>> The graphs:
>>>
>>> [image: Inline image 15][image: Inline image 14][image: Inline image 16][image:
>>> Inline image 9][image: Inline image 13][image: Inline image 10][image:
>>> Inline image 12][image: Inline image 11]
>>>
>>> Many Thanks,
>>> Vasilis
>>>
>>> P.S.
>>>
>>> Thanks to jeffj for his help on IRC!
>>>
>>
>>
>

Re: Streaming Process: How can we speed it up?

Posted by Vasileios Vlachos <va...@gmail.com>.
Hello,

We use Nagios + NRPE, PNP4Nagios and a few templates in order to plot
correlating counters on the same graph when needed. For the majority of our
Cassandra-specific checks, we use the JMX console on each node.

On Mon, Sep 12, 2016 at 3:59 PM, Nagh <na...@gmail.com> wrote:

> Hi Vasilis,
>                     My name is Nagaraj.I'm building a new Cassandra
> cluster in our organization.We are going to use Apache Cassandra 3.0.8.I've
> have seen your attachments for the monitoring Cassandra.I just want to know
> which Monitoring tool you are using for Cassandra Metrics and alerts.Do you
> suggest anything to me.Appreciate your help on this.
>
> On Mon, Sep 12, 2016 at 10:38 AM, Vasileios Vlachos <
> vasileiosvlachos@gmail.com> wrote:
>
>> Hello,
>>
>> We use cassandra 2.0.17 at the moment and we are rebuilding our nodes;
>> this involves taking one node down at a time and bringing the new node up
>> with JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node"
>> in cassandra-env.sh. In order to increase the streaming times we doubled
>> stream_throughput_outbound_megabits_per_sec from 200 to 400 on all nodes
>> in the cluster.
>>
>> The problem is that streaming takes a long time to complete. On Friday I
>> asked the IRC channel and jeffj provided some feedback, but I saw his
>> responses hours later. I have included some graphs at the bottom of this
>> email which show CPU performance and network utilisation on the cluster
>> during the streaming process. Basically, jeffj's suspicion was that we are
>> CPU-bound on the receiving node. The graphs show that CPU utilisation is
>> not high enough for us to conclude that CPU is our bottleneck; unless
>> during streaming, Cassandra uses one core per connection/node. Does anyone
>> know if that's the case?
>>
>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 87)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Executing streaming plan
>> for Bootstrap
>> INFO [main] 2016-09-12 12:34:19,800 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.3.5.2
>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.1
>> INFO [StreamConnectionEstablisher:1] 2016-09-12 12:34:19,801
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.3.5.2
>> INFO [main] 2016-09-12 12:34:19,801 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.3.5.3
>> INFO [main] 2016-09-12 12:34:19,806 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.2
>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,806
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.3.5.3
>> INFO [StreamConnectionEstablisher:2] 2016-09-12 12:34:19,802
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.1
>> INFO [main] 2016-09-12 12:34:19,809 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.3
>> INFO [StreamConnectionEstablisher:4] 2016-09-12 12:34:19,809
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.2
>> INFO [main] 2016-09-12 12:34:19,811 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.1.5.4
>> INFO [StreamConnectionEstablisher:5] 2016-09-12 12:34:19,811
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.3
>> INFO [main] 2016-09-12 12:34:19,815 StreamResultFuture.java (line 91)
>> [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e] Beginning stream session
>> with /10.3.5.4
>> INFO [StreamConnectionEstablisher:6] 2016-09-12 12:34:19,818
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.1.5.4
>> INFO [StreamConnectionEstablisher:3] 2016-09-12 12:34:19,824
>> StreamSession.java (line 214) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Starting streaming to /10.3.5.4
>> INFO [STREAM-IN-/10.3.5.4] 2016-09-12 12:34:19,846
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.3.5.4 is complete
>> INFO [STREAM-IN-/10.1.5.1] 2016-09-12 12:34:19,875
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.1 is complete
>> INFO [STREAM-IN-/10.1.5.2] 2016-09-12 12:34:19,897
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.2 is complete
>> INFO [STREAM-IN-/10.1.5.3] 2016-09-12 12:34:19,898
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.3 is complete
>> INFO [STREAM-IN-/10.1.5.4] 2016-09-12 12:34:19,901
>> StreamResultFuture.java (line 186) [Stream #d5708c40-78dc-11e6-b7ea-857314f4c01e]
>> Session with /10.1.5.4 is complete
>>
>> The above output is from system.log during initiation of the streaming
>> process on one of the new nodes. The 10.1.X.X nodes are located in a
>> different DC. I understand why these nodes are not used for streaming,
>> however, I do not understand why 10.3.5.4 is not streaming data to
>> 10.3.5.1. Any ideas why would this happen?
>>
>> Looking at cassandra004's network utilisation graph, we can see that the
>> node was streaming at 20MBps initially, then at 10MBps when only one node
>> was sending data to it. We seem to only be able to receive data at
>> 10MBps/Tx node. Could we do something in order to be able to stream from
>> more nodes and/or increase the streaming speed?
>>
>> The graphs:
>>
>> [image: Inline image 15][image: Inline image 14][image: Inline image 16][image:
>> Inline image 9][image: Inline image 13][image: Inline image 10][image:
>> Inline image 12][image: Inline image 11]
>>
>> Many Thanks,
>> Vasilis
>>
>> P.S.
>>
>> Thanks to jeffj for his help on IRC!
>>
>
>