You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "R. T." <ra...@protonmail.com.INVALID> on 2019/06/12 23:36:27 UTC

very slow repair

Hi,

I am trying to run a repair for first time a specific column family in specific keyspace and it seems that is going super slow.

I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non incremental, DC parallel one. This column family is around 4 TB and it is written heavily (compared with other CF) so since it is going to take 2 months (according ETA in Reaper) does that mean that when this repair will finish the entropy will be again high in this CF ?

How I can speed up the process ? Is there any way to diagnose bottlenecs?

Thank you,

W

Re: very slow repair

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Thu, Jun 13, 2019 at 2:09 PM Léo FERLIN SUTTON
<lf...@mailjet.com.invalid> wrote:

> Last, but not least: are you using the default number of vnodes, 256?  The
>> overhead of large number of vnodes (times the number of nodes), can be
>> quite significant.  We've seen major improvements in repair runtime after
>> switching from 256 to 16 vnodes on Cassandra version 3.0.
>
>
> Is there a recommended procedure to switch the amount of vnodes ?
>

Yes.  One should deploy a new virtual DC with desired configuration and
rebuild from the original one, then decommission the old virtual DC.

With the smaller number of vnodes you should use
allocate_tokens_for_keyspace configuration parameter to ensure uniform load
distribution.  The caveat is that the nodes allocate tokens before they
bootstrap, so the very first nodes will not have keyspace information
available.  This can be worked around, though it is not trivial.  See this
thread for our past experience:
https://lists.apache.org/thread.html/396f2d20397c36b9cff88a0c2c5523154d420ece24a4dafc9fde3d1f@%3Cuser.cassandra.apache.org%3E

--
Alex

Re: very slow repair

Posted by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID>.

>
> Last, but not least: are you using the default number of vnodes, 256?  The
> overhead of large number of vnodes (times the number of nodes), can be
> quite significant.  We've seen major improvements in repair runtime after
> switching from 256 to 16 vnodes on Cassandra version 3.0.


Is there a recommended procedure to switch the amount of vnodes ?

Regards,

Leo

On Thu, Jun 13, 2019 at 12:06 PM Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

> On Thu, Jun 13, 2019 at 10:36 AM R. T. <ra...@protonmail.com.invalid>
> wrote:
>
>>
>> Well, actually by running cfstats I can see that the totaldiskspaceused
>> is about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was
>> off for a while thats why there is a difference in space.
>>
>> I am using Cassandra 3.0.6 and
>> my stream_throughput_outbound_megabits_per_sec is th4e default setting so
>> according to my version is (200 Mbps or 25 MB/s)
>>
>
> And the other setting: compaction_throughput_mb_per_sec?  It is also
> highly relevant for repair performance, as streamed in files need to be
> compacted with the existing files on the nodes.  In our experience change
> in compaction throughput limit is almost linearly reflected by the repair
> run time.
>
> The default 16 MB/s is too limiting for any production grade setup, I
> believe.  We go as high as 90 MB/s on AWS EBS gp2 data volumes.  But don't
> take it as a gospel, I'd suggest you start increasing the setting (e.g. by
> doubling it) and observe how it affects repair performance (and client
> latencies).
>
> Have you tried with "parallel" instead of "DC parallel" mode?  The latter
> one is really poorly named and it actually means something else, as neatly
> highlighted in this SO answer: https://dba.stackexchange.com/a/175028
>
> Last, but not least: are you using the default number of vnodes, 256?  The
> overhead of large number of vnodes (times the number of nodes), can be
> quite significant.  We've seen major improvements in repair runtime after
> switching from 256 to 16 vnodes on Cassandra version 3.0.
>
> Cheers,
> --
> Alex
>
>

Re: very slow repair

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Thu, Jun 13, 2019 at 10:36 AM R. T. <ra...@protonmail.com.invalid>
wrote:

>
> Well, actually by running cfstats I can see that the totaldiskspaceused is
> about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was off
> for a while thats why there is a difference in space.
>
> I am using Cassandra 3.0.6 and
> my stream_throughput_outbound_megabits_per_sec is th4e default setting so
> according to my version is (200 Mbps or 25 MB/s)
>

And the other setting: compaction_throughput_mb_per_sec?  It is also highly
relevant for repair performance, as streamed in files need to be compacted
with the existing files on the nodes.  In our experience change in
compaction throughput limit is almost linearly reflected by the repair run
time.

The default 16 MB/s is too limiting for any production grade setup, I
believe.  We go as high as 90 MB/s on AWS EBS gp2 data volumes.  But don't
take it as a gospel, I'd suggest you start increasing the setting (e.g. by
doubling it) and observe how it affects repair performance (and client
latencies).

Have you tried with "parallel" instead of "DC parallel" mode?  The latter
one is really poorly named and it actually means something else, as neatly
highlighted in this SO answer: https://dba.stackexchange.com/a/175028

Last, but not least: are you using the default number of vnodes, 256?  The
overhead of large number of vnodes (times the number of nodes), can be
quite significant.  We've seen major improvements in repair runtime after
switching from 256 to 16 vnodes on Cassandra version 3.0.

Cheers,
--
Alex

Re: very slow repair

Posted by "R. T." <ra...@protonmail.com.INVALID>.

Hi,

Thank you for your reply,

Well, actually by running cfstats I can see that the totaldiskspaceused is about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was off for a while thats why there is a difference in space.

I am using Cassandra 3.0.6 and my stream_throughput_outbound_megabits_per_sec is th4e default setting so according to my version is (200 Mbps or 25 MB/s)

Cheers

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, June 13, 2019 6:04 AM, Laxmikant Upadhyay <la...@gmail.com> wrote:

> Few queries:
> 1. What is the cassandra version ?
> 2. is the size of table 4TB per node ?
> 3. What is the value of compaction_throughput_mb_per_sec and stream_throughput_outbound_megabits_per_sec ?
>
> On Thu, Jun 13, 2019 at 5:06 AM R. T. <ra...@protonmail.com.invalid> wrote:
>
>> Hi,
>>
>> I am trying to run a repair for first time a specific column family in specific keyspace and it seems that is going super slow.
>>
>> I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non incremental, DC parallel one. This column family is around 4 TB and it is written heavily (compared with other CF) so since it is going to take 2 months (according ETA in Reaper) does that mean that when this repair will finish the entropy will be again high in this CF ?
>>
>> How I can speed up the process ? Is there any way to diagnose bottlenecs?
>>
>> Thank you,
>>
>> W
>
> --
>
> regards,
> Laxmikant Upadhyay

Re: very slow repair

Posted by Laxmikant Upadhyay <la...@gmail.com>.

Few queries:
1. What is the cassandra version ?
2. is the size of table 4TB per node ?
3. What is the value of compaction_throughput_mb_per_sec and
stream_throughput_outbound_megabits_per_sec ?

On Thu, Jun 13, 2019 at 5:06 AM R. T. <ra...@protonmail.com.invalid>
wrote:

> Hi,
>
> I am trying to run a repair for first time a specific column family in
> specific keyspace and it seems that is going super slow.
>
> I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non
> incremental, DC parallel one. This column family is around 4 TB and it is
> written heavily (compared with other CF) so since it is going to take 2
> months (according ETA in Reaper) does that mean that when this repair will
> finish the entropy will be again high in this CF ?
>
> How I can speed up the process ? Is there any way to diagnose bottlenecs?
>
> Thank you,
>
> W
>
>

-- 

regards,
Laxmikant Upadhyay