You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sam Kramer <sa...@gmail.com> on 2021/11/24 15:36:50 UTC

Which source replica does rebuild stream from?

Hi,

We're looking into adding a second datacenter to our cluster via a
rebuild, and we're curious on how Cassandra determines which source
replica to rebuild from in the source datacenter. For a bit more
context, we're using the Ec2Snitch with dynamic snitch enabled, and
are using NetworkTopologyStrategy for all of our keyspaces with RF =
3.

Looking at the source code, it appears that it's determined by the
closest replica in the source datacenter via the snitch
(https://github.com/apache/cassandra/blob/cassandra-3.11.11/src/java/org/apache/cassandra/dht/RangeStreamer.java#L226),
which I think is generally fine. Is this correct, or am I mis-reading
the code?

If so, there appears to be an edge case surrounding consistency which
I would like to clarify:

Assuming identical topologies, there is no strict guarantee that each
source replica is streamed over to the destination datacenter. This is
because we're using the snitch to determine proximity, which could
have removed a node from its own list for being down, or dynamic
snitch itself could've weighed it with a higher score.

As a result, when rebuilding each node in their respective racks, it
is totally possible for all racks to receive the same data from the
same source replica. Which, of course, may not be fully consistent?

Cheers,
Sam

Re: Which source replica does rebuild stream from?

Posted by Jeff Jirsa <jj...@gmail.com>.
Using each and local consistencies here gives you some safety in the transient steps but also suggests you have control over when you move traffic 

Is all traffic going to the first DC while you add the second?
If so, set RF=3 and run repair before you move traffic

If you were using quorum instead of local, you’d:
- go from RF=0 to 1 in the new DC
- run rebuild, then run full or incremental repair (4.0+)
- go from rf=1 to 2, then rebuild then repair
- go from rf=2 to 3 then rebuild then repair

Tearing down a dc do the inverse

But again, using each and local here is pretty safe - you’re confining your reads to where you query and you can do a single rebuild 
 + repair after going to 3 


> On Nov 25, 2021, at 11:53 AM, Sam Kramer <sa...@gmail.com> wrote:
> 
> 
> Hi both, thank you for your responses!
> 
> Yes Jeff, we expect strictly correct responses. Our starting / ending topologies are near-identical (DC1: A/B/C, DC2: A/B/C), and reads are performed at LOCAL_QUORUM, while writes are done at EACH_QUORUM or ALL.
> 
> Thanks,
> Sam
> 
>> On Thu, Nov 25, 2021 at 9:38 AM Jeff Jirsa <jj...@gmail.com> wrote:
>> The risk is not negligible if you expect strictly correct responses
>> 
>> The only way to do this correctly is very, very labor intensive at the moment, and it requires repair between rebuilds and incrementally adding replicas such that you don’t violate consistency 
>> 
>> If you give me the starting topology, ending topology, and what consistency level you use for reads and writes I’ll describe the changes you have to do to do this safely
>> 
>> 
>> 
>>>> On Nov 25, 2021, at 8:49 AM, Erick Ramirez <er...@datastax.com> wrote:
>>>> 
>>> 
>>> Yes, you are correct that the source may not necessarily be fully consistent. But this risk is negligible if your cluster is sized-correctly and nodes are not dropping mutations.
>>> 
>>> If your nodes are dropping mutations because they're overloaded and cannot keep up with writes, rebuild is probably the least of your problems. Cheers!

Re: Which source replica does rebuild stream from?

Posted by Sam Kramer <sa...@gmail.com>.
Hi both, thank you for your responses!

Yes Jeff, we expect strictly correct responses. Our starting / ending
topologies are near-identical (DC1: A/B/C, DC2: A/B/C), and reads are
performed at LOCAL_QUORUM, while writes are done at EACH_QUORUM or ALL.

Thanks,
Sam

On Thu, Nov 25, 2021 at 9:38 AM Jeff Jirsa <jj...@gmail.com> wrote:

> The risk is not negligible if you expect strictly correct responses
>
> The only way to do this correctly is very, very labor intensive at the
> moment, and it requires repair between rebuilds and incrementally adding
> replicas such that you don’t violate consistency
>
> If you give me the starting topology, ending topology, and what
> consistency level you use for reads and writes I’ll describe the changes
> you have to do to do this safely
>
>
>
> On Nov 25, 2021, at 8:49 AM, Erick Ramirez <er...@datastax.com>
> wrote:
>
> 
> Yes, you are correct that the source may not necessarily be fully
> consistent. But this risk is negligible if your cluster is sized-correctly
> and nodes are not dropping mutations.
>
> If your nodes are dropping mutations because they're overloaded and cannot
> keep up with writes, rebuild is probably the least of your problems. Cheers!
>
>>

Re: Which source replica does rebuild stream from?

Posted by Jeff Jirsa <jj...@gmail.com>.
The risk is not negligible if you expect strictly correct responses

The only way to do this correctly is very, very labor intensive at the moment, and it requires repair between rebuilds and incrementally adding replicas such that you don’t violate consistency 

If you give me the starting topology, ending topology, and what consistency level you use for reads and writes I’ll describe the changes you have to do to do this safely



> On Nov 25, 2021, at 8:49 AM, Erick Ramirez <er...@datastax.com> wrote:
> 
> 
> Yes, you are correct that the source may not necessarily be fully consistent. But this risk is negligible if your cluster is sized-correctly and nodes are not dropping mutations.
> 
> If your nodes are dropping mutations because they're overloaded and cannot keep up with writes, rebuild is probably the least of your problems. Cheers!

Re: Which source replica does rebuild stream from?

Posted by Erick Ramirez <er...@datastax.com>.
Yes, you are correct that the source may not necessarily be fully
consistent. But this risk is negligible if your cluster is sized-correctly
and nodes are not dropping mutations.

If your nodes are dropping mutations because they're overloaded and cannot
keep up with writes, rebuild is probably the least of your problems. Cheers!

>