You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fd Habash <fm...@gmail.com> on 2019/06/28 14:52:31 UTC

Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Hi all …

The datastax & apache docs are clear: run ‘nodetool repair’ after you alter a keyspace to change its RF or RS.

However, the details are all over the place as what type of repair and on what nodes it needs to run. None of the above doc authorities are clear and what you find on the internet is quite contradictory.

For example, this IBM doc suggest to run both the ‘alter keyspace’ and repair on EACH node affected or on ‘each node you need to change the RF on’.  Others, suggest to run ‘repair -pr’. 

On a cluster of 1 DC and three racks, this is how I understand it ….
1. Run the ‘alter keyspace’ on a SINGLE node. 
2. As for repairing the altered keyspac, I assume there are two options …
a. Run ‘repair -full [key_space]’ on all nodes in all racks
b. Run ‘repair -pr -full [keyspace] on all nodes in all racks

Sounds correct? 

----------------
Thank you


Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Posted by Jeff Jirsa <jj...@gmail.com>.
RF=5 allows you to lose two hosts without losing quorum

Many teams can calculate their hardware failure rate and replacement time. If you can do both of these things you can pick and RF that meets your durability and availability SLO. For sufficiently high SLOs you’ll need RF > 3



> On Jun 30, 2019, at 11:58 PM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
>> On Sat, Jun 29, 2019 at 5:49 AM Jeff Jirsa <jj...@gmail.com> wrote:
> 
>> If you’re at RF= 3 and read/write at quorum, you’ll have full visibility of all data if you switch to RF=4 and continue reading at quorum because quorum if 4 is 3, so you’re guaranteed to overlap with at least one of the two nodes that got all earlier writes
>> 
>> Going from 3 to 4 to 5 requires a repair after 4.
> 
> Understood, thanks for detailing it.
> 
> At the same time, is it ever practical to use RF > 3?  Is it practical to switch to 5 if you already have 3?
> 
> I imagine this question is popping up more often in a context of switching from RF < 3 to =3.  As well as switching from non-NTS to NTS, in which case it is indeed quite troublesome, as you have pointed out.
> 
> --
> Alex
> 

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Sat, Jun 29, 2019 at 5:49 AM Jeff Jirsa <jj...@gmail.com> wrote:

> If you’re at RF= 3 and read/write at quorum, you’ll have full visibility
> of all data if you switch to RF=4 and continue reading at quorum because
> quorum if 4 is 3, so you’re guaranteed to overlap with at least one of the
> two nodes that got all earlier writes
>
> Going from 3 to 4 to 5 requires a repair after 4.
>

Understood, thanks for detailing it.

At the same time, is it ever practical to use RF > 3?  Is it practical to
switch to 5 if you already have 3?

I imagine this question is popping up more often in a context of switching
from RF < 3 to =3.  As well as switching from non-NTS to NTS, in which case
it is indeed quite troublesome, as you have pointed out.

--
Alex

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Posted by Jeff Jirsa <jj...@gmail.com>.
If you’re at RF= 3 and read/write at quorum, you’ll have full visibility of all data if you switch to RF=4 and continue reading at quorum because quorum if 4 is 3, so you’re guaranteed to overlap with at least one of the two nodes that got all earlier writes

Going from 3 to 4 to 5 requires a repair after 4.


> On Jun 28, 2019, at 8:32 PM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
>> On Fri, Jun 28, 2019 at 11:29 PM Jeff Jirsa <jj...@gmail.com> wrote:
> 
>>  you often have to run repair after each increment - going from 3 -> 5 means 3 -> 4, repair, 4 -> 5 - just going 3 -> 5 will violate consistency guarantees, and is technically unsafe.
> 
> Jeff,
> 
> How going from 3 -> 4 is *not violating* consistency guarantees already?  Are you assuming quorum writes and reads and a perfectly repaired keyspace?
> 
> Regards,
> --
> Alex
> 

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Fri, Jun 28, 2019 at 11:29 PM Jeff Jirsa <jj...@gmail.com> wrote:

>  you often have to run repair after each increment - going from 3 -> 5
> means 3 -> 4, repair, 4 -> 5 - just going 3 -> 5 will violate consistency
> guarantees, and is technically unsafe.
>

Jeff,

How going from 3 -> 4 is *not violating* consistency guarantees already?
Are you assuming quorum writes and reads and a perfectly repaired keyspace?

Regards,
--
Alex

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Posted by Jon Haddad <jo...@jonhaddad.com>.
Yep - not to mention the increased complexity and overhead of going from
ONE to QUORUM, or the increased cost of QUORUM in RF=5 vs RF=3.

If you're in a cloud provider, I've found you're almost always better off
adding a new DC with a higher RF, assuming you're on NTS like Jeff
mentioned.

On Fri, Jun 28, 2019 at 2:29 PM Jeff Jirsa <jj...@gmail.com> wrote:

> For just changing RF:
>
> You only need to repair the full token range - how you do that is up to
> you. Running `repair -pr -full` on each node will do that. Running `repair
> -full` will do it multiple times, so it's more work, but technically
> correct.The caveat that few people actually appreciate about changing
> replication factors (# of copies per DC) is that you often have to run
> repair after each increment - going from 3 -> 5 means 3 -> 4, repair, 4 ->
> 5 - just going 3 -> 5 will violate consistency guarantees, and is
> technically unsafe.
>
> For changing replication strategy:
>
> Changing replication strategy is nontrivial - going from Simple to NTS is
> often easy to do in a truly eventual consistency use case, but becomes much
> harder if you're:
> - using multiple DCs or
> - vnodes + racks or
> - if you must do it without violating consistency.
>
> It turns out if you're not using multiple DCs or racks, then
> simplestrategy is fine. But if you are using multiple DCs/racks, then
> changing is very very hard. So usually by the time you're asking how to do
> this, you're in a very bad position.
>
> Do you have simple strategy and multiple DCs?
> Are you using vnodes and racks?
>
> I'd be incredibly skeptical about any blog that tried to give concrete
> steps on how to do this - the steps are probably right 80% of the time, but
> horribly wrong 20% of the time, especially if there's not a paragraph or
> two about racks along the way.
>
>
>
>
>
> On Fri, Jun 28, 2019 at 7:52 AM Fd Habash <fm...@gmail.com> wrote:
>
>> Hi all …
>>
>>
>>
>> The datastax & apache docs are clear: run ‘nodetool repair’ after you
>> alter a keyspace to change its RF or RS.
>>
>>
>>
>> However, the details are all over the place as what type of repair and on
>> what nodes it needs to run. None of the above doc authorities are clear and
>> what you find on the internet is quite contradictory.
>>
>>
>>
>> For example, this IBM doc
>> <https://www.ibm.com/support/knowledgecenter/en/SS3JSW_5.2.0/com.ibm.help.gdha_administering.doc/com.ibm.help.gdha_administering.doc/gdha_changing_replication_factor.html>
>> suggest to run both the ‘alter keyspace’ and repair on EACH node affected
>> or on ‘each node you need to change the RF on’.  Others
>> <https://myadventuresincoding.wordpress.com/2019/01/29/cassandra-switching-from-simplestrategy-to-networktopologystrategy/>,
>> suggest to run ‘repair -pr’.
>>
>>
>>
>> On a cluster of 1 DC and three racks, this is how I understand it ….
>>
>>    1. Run the ‘alter keyspace’ on a SINGLE node.
>>    2. As for repairing the altered keyspac, I assume there are two
>>    options …
>>       1. Run ‘repair -full [key_space]’ on all nodes in all racks
>>       2. Run ‘repair -pr -full [keyspace] on all nodes in all racks
>>
>>
>>
>> Sounds correct?
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

Posted by Jeff Jirsa <jj...@gmail.com>.
For just changing RF:

You only need to repair the full token range - how you do that is up to
you. Running `repair -pr -full` on each node will do that. Running `repair
-full` will do it multiple times, so it's more work, but technically
correct.The caveat that few people actually appreciate about changing
replication factors (# of copies per DC) is that you often have to run
repair after each increment - going from 3 -> 5 means 3 -> 4, repair, 4 ->
5 - just going 3 -> 5 will violate consistency guarantees, and is
technically unsafe.

For changing replication strategy:

Changing replication strategy is nontrivial - going from Simple to NTS is
often easy to do in a truly eventual consistency use case, but becomes much
harder if you're:
- using multiple DCs or
- vnodes + racks or
- if you must do it without violating consistency.

It turns out if you're not using multiple DCs or racks, then simplestrategy
is fine. But if you are using multiple DCs/racks, then changing is very
very hard. So usually by the time you're asking how to do this, you're in a
very bad position.

Do you have simple strategy and multiple DCs?
Are you using vnodes and racks?

I'd be incredibly skeptical about any blog that tried to give concrete
steps on how to do this - the steps are probably right 80% of the time, but
horribly wrong 20% of the time, especially if there's not a paragraph or
two about racks along the way.





On Fri, Jun 28, 2019 at 7:52 AM Fd Habash <fm...@gmail.com> wrote:

> Hi all …
>
>
>
> The datastax & apache docs are clear: run ‘nodetool repair’ after you
> alter a keyspace to change its RF or RS.
>
>
>
> However, the details are all over the place as what type of repair and on
> what nodes it needs to run. None of the above doc authorities are clear and
> what you find on the internet is quite contradictory.
>
>
>
> For example, this IBM doc
> <https://www.ibm.com/support/knowledgecenter/en/SS3JSW_5.2.0/com.ibm.help.gdha_administering.doc/com.ibm.help.gdha_administering.doc/gdha_changing_replication_factor.html>
> suggest to run both the ‘alter keyspace’ and repair on EACH node affected
> or on ‘each node you need to change the RF on’.  Others
> <https://myadventuresincoding.wordpress.com/2019/01/29/cassandra-switching-from-simplestrategy-to-networktopologystrategy/>,
> suggest to run ‘repair -pr’.
>
>
>
> On a cluster of 1 DC and three racks, this is how I understand it ….
>
>    1. Run the ‘alter keyspace’ on a SINGLE node.
>    2. As for repairing the altered keyspac, I assume there are two
>    options …
>       1. Run ‘repair -full [key_space]’ on all nodes in all racks
>       2. Run ‘repair -pr -full [keyspace] on all nodes in all racks
>
>
>
> Sounds correct?
>
>
>
> ----------------
> Thank you
>
>
>