You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by William Oberman <ob...@civicscience.com> on 2014/05/13 19:41:29 UTC

NTS, vnodes and 0% chance of data loss

I found this:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3CCAEDUwd1erQ-1M-Kfj6ubzSBeseR8dwH+g-KGdPsTnBGqsqcEvg@mail.gmail.com%3E

I read the three referenced cases.  In addition, case 4123 references:
http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html

And even though I *think* I understand all of the issues now, I still want
to double check...

Assumptions:
-A cluster using NTS with options [DC:3]
-Physical layout = In DC, 3 nodes/rack for a total of 9 nodes

No vnodes: I could do token selection using ideas from case 3810 such that
each rack has one replica.  At this point, my "0% chance of data loss"
scenarios are:
1.) Failure of two nodes at random
2.) Failure of 2 racks (6 nodes!)

Vnodes: my "0% chance of data loss" scenarios are:
1.) Failure of two nodes at random
Which means a rack failure (3 nodes) has a non-zero chance of data failure
(right?).

To get specific, I'm in AWS, so racks ~= "availability zones".  In the
years I've been in AWS, I've seen several occasions of "single zone
downtimes", and one time of "single zone catastrophic loss".  E.g. for AWS
I feel like you *have* to plan for a single zone failure, and in terms of
"safety first" you *should* plan for two zone failures.

To mitigate this data loss risk seems rough for vnodes, again if I'm
understanding everything correctly:
-To ensure 0% data loss for one zone => I need RF=4
-To ensure 0% data loss for two zones => I need RF=7

I'd really like to use vnodes, but RF=7 is crazy.

To reiterate what I think is the core idea of this message:
1.) for vnodes 0% data loss => RF=(# of allowed failures at once)+1
2.) racks don't change the above equation at all

will

Re: NTS, vnodes and 0% chance of data loss

Posted by William Oberman <ob...@civicscience.com>.

After sleeping on this, I'm sure my original conclusions are wrong.  In all
of the referenced cases/threads, I internalized "rack awareness" and
"hotspots" to mean something different and wrong.  A hotspot didn't mean
multiple replicas in the same rack (as I had been thinking), it meant the
process of finding replica placement might hit the same vnode
proportionally wrong due to the random association of vnodes <-> {dc,rack}.

To not people astray, I think everything in my email below is correct
until: "Which means a rack failure (3 nodes) has a non-zero chance of data
failure (right?)."  And again, my flaw was thinking that when Cassandra
selected replicas for token "X" in a vnode world, that it would possibly
pick vnodes that happened to be on the same rack due to random placements
of the tokens.  That is wrong (looking at the source for NTS), as NTS does
skip over the same rack (though, it will allow multiple in the same rack if
you "fill up"... I guess if someone did DC:4 with 3 racks they'll always
get one rack with two copies of the data, for example).

will

On Tue, May 13, 2014 at 1:41 PM, William Oberman
<ob...@civicscience.com>wrote:

> I found this:
>
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3CCAEDUwd1erQ-1M-Kfj6ubzSBeseR8dwH+g-KGdPsTnBGqsqcEvg@mail.gmail.com%3E
>
> I read the three referenced cases.  In addition, case 4123 references:
> http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html
>
> And even though I *think* I understand all of the issues now, I still want
> to double check...
>
> Assumptions:
> -A cluster using NTS with options [DC:3]
> -Physical layout = In DC, 3 nodes/rack for a total of 9 nodes
>
> No vnodes: I could do token selection using ideas from case 3810 such that
> each rack has one replica.  At this point, my "0% chance of data loss"
> scenarios are:
> 1.) Failure of two nodes at random
> 2.) Failure of 2 racks (6 nodes!)
>
> Vnodes: my "0% chance of data loss" scenarios are:
> 1.) Failure of two nodes at random
> Which means a rack failure (3 nodes) has a non-zero chance of data failure
> (right?).
>
> To get specific, I'm in AWS, so racks ~= "availability zones".  In the
> years I've been in AWS, I've seen several occasions of "single zone
> downtimes", and one time of "single zone catastrophic loss".  E.g. for AWS
> I feel like you *have* to plan for a single zone failure, and in terms of
> "safety first" you *should* plan for two zone failures.
>
> To mitigate this data loss risk seems rough for vnodes, again if I'm
> understanding everything correctly:
> -To ensure 0% data loss for one zone => I need RF=4
> -To ensure 0% data loss for two zones => I need RF=7
>
> I'd really like to use vnodes, but RF=7 is crazy.
>
> To reiterate what I think is the core idea of this message:
> 1.) for vnodes 0% data loss => RF=(# of allowed failures at once)+1
> 2.) racks don't change the above equation at all
>
> will
>

RE: NTS, vnodes and 0% chance of data loss

Posted by Mark Farnan <de...@petrolink.com>.

Why not use NetworkTopology and specify each region as a ‘DC’ ?

 

Setup a snitch (propertyFile or Gossip, or even the EC2Region one) to list out which nodes are in which DC. 

 

Then when creating the Keyspace, specify NetworkTopology,  with RF1 in each DC / Rack.

 

Ie.

CREATE KEYSPACE fred WITH replication = {'class': 'NetworkTopologyStrategy', 'DC2': '1', 'DC3': '1', 'DC1': '1'};

 

Regards

 

 

Mark Farnan



 

 

From: William Oberman [mailto:oberman@civicscience.com] 
Sent: Tuesday, May 13, 2014 11:11 PM
To: user@cassandra.apache.org
Subject: NTS, vnodes and 0% chance of data loss

 

I found this:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3CCAEDUwd1erQ-1M-Kfj6ubzSBeseR8dwH+g-KGdPsTnBGqsqcEvg@mail.gmail.com%3E

 

I read the three referenced cases.  In addition, case 4123 references:

http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html

 

And even though I *think* I understand all of the issues now, I still want to double check...

 

Assumptions:

-A cluster using NTS with options [DC:3]

-Physical layout = In DC, 3 nodes/rack for a total of 9 nodes

 

No vnodes: I could do token selection using ideas from case 3810 such that each rack has one replica.  At this point, my "0% chance of data loss" scenarios are:

1.) Failure of two nodes at random

2.) Failure of 2 racks (6 nodes!)

 

Vnodes: my "0% chance of data loss" scenarios are:

1.) Failure of two nodes at random

Which means a rack failure (3 nodes) has a non-zero chance of data failure (right?).

 

To get specific, I'm in AWS, so racks ~= "availability zones".  In the years I've been in AWS, I've seen several occasions of "single zone downtimes", and one time of "single zone catastrophic loss".  E.g. for AWS I feel like you *have* to plan for a single zone failure, and in terms of "safety first" you *should* plan for two zone failures.

 

To mitigate this data loss risk seems rough for vnodes, again if I'm understanding everything correctly:

-To ensure 0% data loss for one zone => I need RF=4

-To ensure 0% data loss for two zones => I need RF=7

 

I'd really like to use vnodes, but RF=7 is crazy.

 

To reiterate what I think is the core idea of this message: 

1.) for vnodes 0% data loss => RF=(# of allowed failures at once)+1

2.) racks don't change the above equation at all

 

will