You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Robert Hellmans <ro...@aastra.com> on 2012/08/24 13:00:36 UTC

Cluster temporarily split into segments

Hi !
 
I'm preparing the test below. I've found a lot of information about
deadnode replacements and adding extra nodes to increase capacity, but
didn't find anything about this segementation issue. Anyone that can
share experience/ideas ?
 
 
Setup:
Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and
CL=ALL(write). 
 
 
Suppose that connectivity breaks down (for whatever reason) causing two
isolated segments:
S1 = {A,B,C,D} and S2 = {E,F}.
 
Cluster connectivity anomalities will be detected by all nodes in this
setup, so clients in S1 and S2 can be advised
to change their CL strategy. It is extremly important that reads will
continue to operate in both S1 and S2 
and I don't see any reason why they shouldn't. It is almost that
important that writes in each segment can continue, but
to be able to write at all, the CL strategy definitely needs to be
changed.
In S1, for instance change to CL=QUORUM for both reads/writes
In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO
 
During the connectivity breakdown, clients in both S1 and S2
simultaneously change/add/delete data. 
 
 
 
So now to the interesting question, what happens when S1 and S2
reestablish full connectivity again ?
Again, the re-connectivity event will be detected, so should I trig some
special repair sequence ?
Or should I've been doing some actions already when the connectivity
broke ?
What about connectivity dropout time, longer/shorter than
max_hint_window ?
 
 
 
 
Rds /Robert
 
 
 

Re: Cluster temporarily split into segments

Posted by aaron morton <aa...@thelastpickle.com>.
> using CL=ONE (read) and CL=ALL(write).
Using this setting you are saying the application should fail in the case of a network partition. You are valuing Consistency and Availability over  Partition Tolerance. Mixing the CL levels in response to a partition will make it difficult to reason about the consistency of the data. Consider other approaches such as CL QUORUM.

While using CL ONE for read looks good. If you require strong consistency you have to use ALL for writes. QUORUM for reads and writes may be a better choice. Using RF 3 and 6 nodes would give you a pretty good availability in the face of node failures (for background http://thelastpickle.com/2011/06/13/Down-For-Me/ ) 

Or relax the Consistency and us CL ONE for reads and CL QUORUM for writes. The writes are still sent to RF nodes. But we can no longer guarantee reads will see them. 

If the high RF is for 
> Suppose that connectivity breaks down (for whatever reason) causing two isolated segments:
> S1 = {A,B,C,D} and S2 = {E,F}.

Do clients still have access to the entire cluster ? Normally we would expect clients to try different nodes until they either fail or find a partition with enough UP nodes to service the request.

> to be able to write at all, the CL strategy definitely needs to be changed.
> In S1, for instance change to CL=QUORUM for both reads/writes
> In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO
Whatever the choice you can imagine a partition where the only thing that works for writes is CL ONE. e.g. if it split 3/3 QUOURM would not work. 

> So now to the interesting question, what happens when S1 and S2 reestablish full connectivity again ?
If you are using CL ALL for writes the easiest things to do is stop writing when the cluster partitions. And resume when it comes back. 

If you drop the CL during writes reads will be inconsistent until either HH has finished or you run repairs. 


>  It is extremly important that reads will continue to operate in both S1 and S2
if it's important that reads continue and are Consistent, I would look at RF 3 with QUOURM / QUOURM. 

If it's important that reads continue and consistency can be relaxed I would look at RF 3 (or 6) and read ONE write QUOURM

Hope that helps. 
  
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 11:00 PM, Robert Hellmans <ro...@aastra.com> wrote:

> Hi !
>  
> I'm preparing the test below. I've found a lot of information about deadnode replacements and adding extra nodes to increase capacity, but didn't find anything about this segementation issue. Anyone that can share experience/ideas ?
>  
>  
> Setup:
> Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and CL=ALL(write).
>  
>  
> Suppose that connectivity breaks down (for whatever reason) causing two isolated segments:
> S1 = {A,B,C,D} and S2 = {E,F}.
>  
> Cluster connectivity anomalities will be detected by all nodes in this setup, so clients in S1 and S2 can be advised
> to change their CL strategy. It is extremly important that reads will continue to operate in both S1 and S2
> and I don't see any reason why they shouldn't. It is almost that important that writes in each segment can continue, but
> to be able to write at all, the CL strategy definitely needs to be changed.
> In S1, for instance change to CL=QUORUM for both reads/writes
> In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO
>  
> During the connectivity breakdown, clients in both S1 and S2 simultaneously change/add/delete data.
>  
>  
>  
> So now to the interesting question, what happens when S1 and S2 reestablish full connectivity again ?
> Again, the re-connectivity event will be detected, so should I trig some special repair sequence ?
> Or should I've been doing some actions already when the connectivity broke ?
> What about connectivity dropout time, longer/shorter than max_hint_window ?
>  
>  
>  
>  
> Rds /Robert
>  
>  
>