You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Vitali Dyachuk <vd...@gmail.com> on 2018/09/12 12:42:27 UTC

nodetool rebuild

Hi,
I'm currently streaming data with nodetool rebuild on 2 nodes, each node is
streaming from different location. The problem is that it takes ~7 days to
stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so it
should take around
~2,5 days . Although there are resources on the destnodes and in the source
regions.
I've increased stream throughput, but its only affects outbound
connections.
Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week
i've changed the CS from ST to LC because of huge sstables and compaction
of them is still ongoing.
How does rebuild command works ? Does it calculate the range then request
the needed sstables from that node and start streaming ? How is it possible
to speed up the streaming ?

Vitali.

Re: nodetool rebuild

Posted by Vitali Dyachuk <vd...@gmail.com>.

Dinesh this is my understanding of streamng options in C* 3.0

1) nodetool rebuild - is a default option to stream data to a new node,
when adding new regions
pros: simply run rebuild to stream data to a new node: nodetool rebuild -dc
<dc> &
cons: If internode compression enabled or per table compression enabled it
takes too much time to stream data because of compression, If compression
disabled then we need to have much more disk space and increase in
streaming speed is only 15%. Also it is single threaded per 1 remote node,
which affects the overall streaming time. If the streaming process failed
you have to rebuild again from scratch and before delete all streamed data.

2) Manually get a new node tokens with SELECT tokens FROM system.local;
then find all replicas for these token ranges, then create sstable
snapshots for these token ranges on the remote nodes, then find manually in
which sstables are these ranges, only after that copy to the new node.
pros: Potentially if we already have snapshots on all nodes, then finding
these ranges in the snapshotted sstables is more or less possible
cons: very difficult manual procedure, which in the end need repairing to
validate the data consistency

3) running repair on the new node with cassandra-reaper
   pros: if the rebuild process has failed at 80% then its makes sense to
run repair which will eventually stream missing data to the new node
   cons: takes too much time to finish since it calculates merkle tree for
each token range finds the difference and then streams. 6 hours for 10Gb of
data

If we are scaling out the existing data centers then the bootstrap process
will take care of streaming data to a new node, we just need to add a new
node to the region.

Vitali

On Sun, Sep 16, 2018 at 11:02 PM Vitali Dyachuk <vd...@gmail.com> wrote:

> Yes, we are using 256 vnodes.Keyspace is configured with
> NetworkTopologyStrategy in 4 regions, with RF3.
> Copying sstabes and running cleanup is a good idea.
>
> On Sun, Sep 16, 2018 at 9:26 PM Dinesh Joshi
> <di...@yahoo.com.invalid> wrote:
>
>> It would be helpful to describe your setup - specifically are you using
>> vnodes? How is the keyspace setup? One option would be to copy SSTables
>> from the replicas and running clean up. That might actually be faster.
>> Since the SSTables are compressed you should use a tool that copies without
>> compressing the data stream in transit.
>>
>> Dinesh
>>
>> On Sep 16, 2018, at 2:07 AM, Vitali Dyachuk <vd...@gmail.com> wrote:
>>
>> Both stream throughput settings are set to 0, meaning that there is no
>> stream throttling on the C* side. Yes, i see high cpu used by STREAM-IN
>> thread, sstables are compressed up to 80%
>> What about copying sstables with rsync and then running repair? Probably
>> its not that simple, vut If the data is RF3 so one node should have all the
>> key ranges and repair will not recalculate all the hashes?
>>
>> Vitali
>>
>> On Sun, Sep 16, 2018, 02:33 dinesh.joshi@yahoo.com.INVALID <
>> dinesh.joshi@yahoo.com.invalid> wrote:
>>
>>> Its a long shot but do you have
>>> stream_throughput_outbound_megabits_per_sec or
>>> inter_dc_stream_throughput_outbound_megabits_per_sec set to a low value?
>>>
>>> You're right in that 3.0 streaming uses 1 thread for incoming and
>>> outgoing connection each per peer. It not only reads the bytes off of the
>>> channel but also deserializes the partitions on that same thread. If you
>>> see high CPU use by STREAM-IN thread then your streaming is CPU bound. In
>>> this situation a powerful CPU will definitely help. Dropping internode
>>> compression and encryption will also help. Are your SSTables compressed?
>>>
>>> Dinesh
>>>
>>>
>>> On Friday, September 14, 2018, 4:15:28 AM PDT, Vitali Dyachuk <
>>> vdjatsuk@gmail.com> wrote:
>>>
>>>
>>> None of these throttling are helpful for streaming if you have even a
>>> 150-200 Mbit/s bandwidth which is affordable in any cloud. Tweaking network
>>> tcp memory, window size etc does not help, the bottleneck is not the
>>> network.
>>> These are my findings on how streaming is limited in C* 3.0.*
>>>
>>> 1)  Streaming of the particular range which needs to be steamed to the
>>> new node is limited with one 1 thread and no tweaking of cpu affinity etc
>>> helps, probably the powerfull computing VM will help
>>> 2) Disabling compression internode_compression and disabling compression
>>> per table in our case helps a bit
>>> 3) When streaming has been dropped there is no resume available for the
>>> streaming range so it will start from the beginning
>>>
>>> One of the options could be to create snapshots of sstables on the
>>> source node and just copy all sstable snapshots to new node and then run
>>> repair, data is ~5TB, RF3 ?
>>> How is it possible at all to stream data fast to a new node/nodes ?
>>>
>>> Vitali.
>>>
>>> On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta <su...@gmail.com>
>>> wrote:
>>>
>>> Increase 3 throughput
>>> Compaction throughput
>>> Stream throughput
>>> Interdcstream throughput (if rebuilding from another DC)
>>>
>>> Make all of the above to 0 and see if there is any improvement and later
>>> set the value if u can’t leave these values to 0.
>>>
>>> On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>> I'm currently streaming data with nodetool rebuild on 2 nodes, each node
>>> is streaming from different location. The problem is that it takes ~7 days
>>> to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so
>>> it should take around
>>> ~2,5 days . Although there are resources on the destnodes and in the
>>> source regions.
>>> I've increased stream throughput, but its only affects outbound
>>> connections.
>>> Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week
>>> i've changed the CS from ST to LC because of huge sstables and compaction
>>> of them is still ongoing.
>>> How does rebuild command works ? Does it calculate the range then
>>> request the needed sstables from that node and start streaming ? How is it
>>> possible to speed up the streaming ?
>>>
>>> Vitali.
>>>
>>>

Re: nodetool rebuild

Posted by Vitali Dyachuk <vd...@gmail.com>.

Yes, we are using 256 vnodes.Keyspace is configured with
NetworkTopologyStrategy in 4 regions, with RF3.
Copying sstabes and running cleanup is a good idea.

On Sun, Sep 16, 2018 at 9:26 PM Dinesh Joshi <di...@yahoo.com.invalid>
wrote:

> It would be helpful to describe your setup - specifically are you using
> vnodes? How is the keyspace setup? One option would be to copy SSTables
> from the replicas and running clean up. That might actually be faster.
> Since the SSTables are compressed you should use a tool that copies without
> compressing the data stream in transit.
>
> Dinesh
>
> On Sep 16, 2018, at 2:07 AM, Vitali Dyachuk <vd...@gmail.com> wrote:
>
> Both stream throughput settings are set to 0, meaning that there is no
> stream throttling on the C* side. Yes, i see high cpu used by STREAM-IN
> thread, sstables are compressed up to 80%
> What about copying sstables with rsync and then running repair? Probably
> its not that simple, vut If the data is RF3 so one node should have all the
> key ranges and repair will not recalculate all the hashes?
>
> Vitali
>
> On Sun, Sep 16, 2018, 02:33 dinesh.joshi@yahoo.com.INVALID <
> dinesh.joshi@yahoo.com.invalid> wrote:
>
>> Its a long shot but do you have
>> stream_throughput_outbound_megabits_per_sec or
>> inter_dc_stream_throughput_outbound_megabits_per_sec set to a low value?
>>
>> You're right in that 3.0 streaming uses 1 thread for incoming and
>> outgoing connection each per peer. It not only reads the bytes off of the
>> channel but also deserializes the partitions on that same thread. If you
>> see high CPU use by STREAM-IN thread then your streaming is CPU bound. In
>> this situation a powerful CPU will definitely help. Dropping internode
>> compression and encryption will also help. Are your SSTables compressed?
>>
>> Dinesh
>>
>>
>> On Friday, September 14, 2018, 4:15:28 AM PDT, Vitali Dyachuk <
>> vdjatsuk@gmail.com> wrote:
>>
>>
>> None of these throttling are helpful for streaming if you have even a
>> 150-200 Mbit/s bandwidth which is affordable in any cloud. Tweaking network
>> tcp memory, window size etc does not help, the bottleneck is not the
>> network.
>> These are my findings on how streaming is limited in C* 3.0.*
>>
>> 1)  Streaming of the particular range which needs to be steamed to the
>> new node is limited with one 1 thread and no tweaking of cpu affinity etc
>> helps, probably the powerfull computing VM will help
>> 2) Disabling compression internode_compression and disabling compression
>> per table in our case helps a bit
>> 3) When streaming has been dropped there is no resume available for the
>> streaming range so it will start from the beginning
>>
>> One of the options could be to create snapshots of sstables on the source
>> node and just copy all sstable snapshots to new node and then run repair,
>> data is ~5TB, RF3 ?
>> How is it possible at all to stream data fast to a new node/nodes ?
>>
>> Vitali.
>>
>> On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta <su...@gmail.com>
>> wrote:
>>
>> Increase 3 throughput
>> Compaction throughput
>> Stream throughput
>> Interdcstream throughput (if rebuilding from another DC)
>>
>> Make all of the above to 0 and see if there is any improvement and later
>> set the value if u can’t leave these values to 0.
>>
>> On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com>
>> wrote:
>>
>> Hi,
>> I'm currently streaming data with nodetool rebuild on 2 nodes, each node
>> is streaming from different location. The problem is that it takes ~7 days
>> to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so
>> it should take around
>> ~2,5 days . Although there are resources on the destnodes and in the
>> source regions.
>> I've increased stream throughput, but its only affects outbound
>> connections.
>> Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week
>> i've changed the CS from ST to LC because of huge sstables and compaction
>> of them is still ongoing.
>> How does rebuild command works ? Does it calculate the range then request
>> the needed sstables from that node and start streaming ? How is it possible
>> to speed up the streaming ?
>>
>> Vitali.
>>
>>

Re: nodetool rebuild

Posted by Dinesh Joshi <di...@yahoo.com.INVALID>.

It would be helpful to describe your setup - specifically are you using vnodes? How is the keyspace setup? One option would be to copy SSTables from the replicas and running clean up. That might actually be faster. Since the SSTables are compressed you should use a tool that copies without compressing the data stream in transit. 

Dinesh

> On Sep 16, 2018, at 2:07 AM, Vitali Dyachuk <vd...@gmail.com> wrote:
> 
> Both stream throughput settings are set to 0, meaning that there is no stream throttling on the C* side. Yes, i see high cpu used by STREAM-IN thread, sstables are compressed up to 80%
> What about copying sstables with rsync and then running repair? Probably its not that simple, vut If the data is RF3 so one node should have all the key ranges and repair will not recalculate all the hashes?
> 
> Vitali
> 
>> On Sun, Sep 16, 2018, 02:33 dinesh.joshi@yahoo.com.INVALID <di...@yahoo.com.invalid> wrote:
>> Its a long shot but do you have stream_throughput_outbound_megabits_per_sec or inter_dc_stream_throughput_outbound_megabits_per_sec set to a low value?
>> 
>> You're right in that 3.0 streaming uses 1 thread for incoming and outgoing connection each per peer. It not only reads the bytes off of the channel but also deserializes the partitions on that same thread. If you see high CPU use by STREAM-IN thread then your streaming is CPU bound. In this situation a powerful CPU will definitely help. Dropping internode compression and encryption will also help. Are your SSTables compressed?
>> 
>> Dinesh
>> 
>> 
>> On Friday, September 14, 2018, 4:15:28 AM PDT, Vitali Dyachuk <vd...@gmail.com> wrote:
>> 
>> 
>> None of these throttling are helpful for streaming if you have even a 150-200 Mbit/s bandwidth which is affordable in any cloud. Tweaking network tcp memory, window size etc does not help, the bottleneck is not the network.
>> These are my findings on how streaming is limited in C* 3.0.*
>> 
>> 1)  Streaming of the particular range which needs to be steamed to the new node is limited with one 1 thread and no tweaking of cpu affinity etc helps, probably the powerfull computing VM will help
>> 2) Disabling compression internode_compression and disabling compression per table in our case helps a bit
>> 3) When streaming has been dropped there is no resume available for the streaming range so it will start from the beginning 
>> 
>> One of the options could be to create snapshots of sstables on the source node and just copy all sstable snapshots to new node and then run repair, data is ~5TB, RF3 ?
>> How is it possible at all to stream data fast to a new node/nodes ? 
>> 
>> Vitali.
>> 
>> On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta <su...@gmail.com> wrote:
>> Increase 3 throughput 
>> Compaction throughput 
>> Stream throughput 
>> Interdcstream throughput (if rebuilding from another DC)
>> 
>> Make all of the above to 0 and see if there is any improvement and later set the value if u can’t leave these values to 0.
>> 
>> On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com> wrote:
>> Hi,
>> I'm currently streaming data with nodetool rebuild on 2 nodes, each node is streaming from different location. The problem is that it takes ~7 days to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so it should take around 
>> ~2,5 days . Although there are resources on the destnodes and in the source regions.
>> I've increased stream throughput, but its only affects outbound connections.  
>> Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week i've changed the CS from ST to LC because of huge sstables and compaction of them is still ongoing.
>> How does rebuild command works ? Does it calculate the range then request the needed sstables from that node and start streaming ? How is it possible to speed up the streaming ?
>> 
>> Vitali.

Re: nodetool rebuild

Posted by Vitali Dyachuk <vd...@gmail.com>.

Both stream throughput settings are set to 0, meaning that there is no
stream throttling on the C* side. Yes, i see high cpu used by STREAM-IN
thread, sstables are compressed up to 80%
What about copying sstables with rsync and then running repair? Probably
its not that simple, vut If the data is RF3 so one node should have all the
key ranges and repair will not recalculate all the hashes?

Vitali

On Sun, Sep 16, 2018, 02:33 dinesh.joshi@yahoo.com.INVALID
<di...@yahoo.com.invalid> wrote:

> Its a long shot but do you have
> stream_throughput_outbound_megabits_per_sec or
> inter_dc_stream_throughput_outbound_megabits_per_sec set to a low value?
>
> You're right in that 3.0 streaming uses 1 thread for incoming and outgoing
> connection each per peer. It not only reads the bytes off of the channel
> but also deserializes the partitions on that same thread. If you see high
> CPU use by STREAM-IN thread then your streaming is CPU bound. In this
> situation a powerful CPU will definitely help. Dropping internode
> compression and encryption will also help. Are your SSTables compressed?
>
> Dinesh
>
>
> On Friday, September 14, 2018, 4:15:28 AM PDT, Vitali Dyachuk <
> vdjatsuk@gmail.com> wrote:
>
>
> None of these throttling are helpful for streaming if you have even a
> 150-200 Mbit/s bandwidth which is affordable in any cloud. Tweaking network
> tcp memory, window size etc does not help, the bottleneck is not the
> network.
> These are my findings on how streaming is limited in C* 3.0.*
>
> 1)  Streaming of the particular range which needs to be steamed to the new
> node is limited with one 1 thread and no tweaking of cpu affinity etc
> helps, probably the powerfull computing VM will help
> 2) Disabling compression internode_compression and disabling compression
> per table in our case helps a bit
> 3) When streaming has been dropped there is no resume available for the
> streaming range so it will start from the beginning
>
> One of the options could be to create snapshots of sstables on the source
> node and just copy all sstable snapshots to new node and then run repair,
> data is ~5TB, RF3 ?
> How is it possible at all to stream data fast to a new node/nodes ?
>
> Vitali.
>
> On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta <su...@gmail.com>
> wrote:
>
> Increase 3 throughput
> Compaction throughput
> Stream throughput
> Interdcstream throughput (if rebuilding from another DC)
>
> Make all of the above to 0 and see if there is any improvement and later
> set the value if u can’t leave these values to 0.
>
> On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com> wrote:
>
> Hi,
> I'm currently streaming data with nodetool rebuild on 2 nodes, each node
> is streaming from different location. The problem is that it takes ~7 days
> to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so
> it should take around
> ~2,5 days . Although there are resources on the destnodes and in the
> source regions.
> I've increased stream throughput, but its only affects outbound
> connections.
> Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week
> i've changed the CS from ST to LC because of huge sstables and compaction
> of them is still ongoing.
> How does rebuild command works ? Does it calculate the range then request
> the needed sstables from that node and start streaming ? How is it possible
> to speed up the streaming ?
>
> Vitali.
>
>

Re: nodetool rebuild

Posted by "dinesh.joshi@yahoo.com.INVALID" <di...@yahoo.com.INVALID>.

Its a long shot but do you have stream_throughput_outbound_megabits_per_sec or inter_dc_stream_throughput_outbound_megabits_per_sec set to a low value?
You're right in that 3.0 streaming uses 1 thread for incoming and outgoing connection each per peer. It not only reads the bytes off of the channel but also deserializes the partitions on that same thread. If you see high CPU use by STREAM-IN thread then your streaming is CPU bound. In this situation a powerful CPU will definitely help. Dropping internode compression and encryption will also help. Are your SSTables compressed?
Dinesh

On Friday, September 14, 2018, 4:15:28 AM PDT, Vitali Dyachuk <vd...@gmail.com> wrote:

None of these throttling are helpful for streaming if you have even a 150-200 Mbit/s bandwidth which is affordable in any cloud. Tweaking network tcp memory, window size etc does not help, the bottleneck is not the network.
These are my findings on how streaming is limited in C* 3.0.*

1) Streaming of the particular range which needs to be steamed to the new node is limited with one 1 thread and no tweaking of cpu affinity etc helps, probably the powerfull computing VM will help
2) Disabling compression internode_compression and disabling compression per table in our case helps a bit
3) When streaming has been dropped there is no resume available for the streaming range so it will start from the beginning

One of the options could be to create snapshots of sstables on the source node and just copy all sstable snapshots to new node and then run repair, data is ~5TB, RF3 ?
How is it possible at all to stream data fast to a new node/nodes ?

Vitali.
On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta <su...@gmail.com> wrote:

Increase 3 throughput Compaction throughput Stream throughput Interdcstream throughput (if rebuilding from another DC)
Make all of the above to 0 and see if there is any improvement and later set the value if u can’t leave these values to 0.
On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com> wrote:

Hi,
I'm currently streaming data with nodetool rebuild on 2 nodes, each node is streaming from different location. The problem is that it takes ~7 days to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s so it should take around
~2,5 days . Although there are resources on the destnodes and in the source regions.
I've increased stream throughput, but its only affects outbound connections.
Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week i've changed the CS from ST to LC because of huge sstables and compaction of them is still ongoing.
How does rebuild command works ? Does it calculate the range then request the needed sstables from that node and start streaming ? How is it possible to speed up the streaming ?
Vitali.

Re: nodetool rebuild

Posted by Vitali Dyachuk <vd...@gmail.com>.

None of these throttling are helpful for streaming if you have even a
150-200 Mbit/s bandwidth which is affordable in any cloud. Tweaking network
tcp memory, window size etc does not help, the bottleneck is not the
network.
These are my findings on how streaming is limited in C* 3.0.*

1)  Streaming of the particular range which needs to be steamed to the new
node is limited with one 1 thread and no tweaking of cpu affinity etc
helps, probably the powerfull computing VM will help
2) Disabling compression internode_compression and disabling compression
per table in our case helps a bit
3) When streaming has been dropped there is no resume available for the
streaming range so it will start from the beginning

One of the options could be to create snapshots of sstables on the source
node and just copy all sstable snapshots to new node and then run repair,
data is ~5TB, RF3 ?
How is it possible at all to stream data fast to a new node/nodes ?

Vitali.

On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta <su...@gmail.com>
wrote:

> Increase 3 throughput
> Compaction throughput
> Stream throughput
> Interdcstream throughput (if rebuilding from another DC)
>
> Make all of the above to 0 and see if there is any improvement and later
> set the value if u can’t leave these values to 0.
>
> On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com> wrote:
>
>> Hi,
>> I'm currently streaming data with nodetool rebuild on 2 nodes, each node
>> is streaming from different location. The problem is that it takes ~7 days
>> to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so
>> it should take around
>> ~2,5 days . Although there are resources on the destnodes and in the
>> source regions.
>> I've increased stream throughput, but its only affects outbound
>> connections.
>> Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week
>> i've changed the CS from ST to LC because of huge sstables and compaction
>> of them is still ongoing.
>> How does rebuild command works ? Does it calculate the range then request
>> the needed sstables from that node and start streaming ? How is it possible
>> to speed up the streaming ?
>>
>> Vitali.
>>
>

Re: nodetool rebuild

Posted by Surbhi Gupta <su...@gmail.com>.

Increase 3 throughput
Compaction throughput
Stream throughput
Interdcstream throughput (if rebuilding from another DC)

Make all of the above to 0 and see if there is any improvement and later
set the value if u can’t leave these values to 0.

On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk <vd...@gmail.com> wrote:

> Hi,
> I'm currently streaming data with nodetool rebuild on 2 nodes, each node
> is streaming from different location. The problem is that it takes ~7 days
> to stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so
> it should take around
> ~2,5 days . Although there are resources on the destnodes and in the
> source regions.
> I've increased stream throughput, but its only affects outbound
> connections.
> Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week
> i've changed the CS from ST to LC because of huge sstables and compaction
> of them is still ongoing.
> How does rebuild command works ? Does it calculate the range then request
> the needed sstables from that node and start streaming ? How is it possible
> to speed up the streaming ?
>
> Vitali.
>