You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jonathan Colby <jo...@gmail.com> on 2010/12/12 13:49:21 UTC

Quorum and Datacenter loss

Hi cassandra experts -

We're planning a cassandra cluster across 2 datacenters
(datacenter-aware, random partitioning) with QUORUM consistency.

It seems to me that with 2 datacenters, if one datacenter is lost,
the  reads/writes to cassandra  will fail in the surviving datacenter
because of the N/2 + 1 distribution of replicas.  In other words, you
need more than half of the replicas to respond but in the case of a
datacenter loss you would only ever get 1/2 to respond at best.

Is my logic wrong here?  Is there a way to ensure the nodes in the
alive datacenter respond successfully if the second datacenter is
lost?  Anyone have experience with this kind of problem?

Thanks.

Re: Quorum and Datacenter loss

Posted by Peter Schuller <pe...@infidyne.com>.

> I think there's a flaw in your logic.  Take the following scenario:

Ok, so sorry if I was being unclear. Yes, if you have one out of two
copies in data center B and the remainder in A, then A will be able to
continue operating qt QUOROM in the event of a partition. But B still
won't, so you have not achieved the goal of the cluster being
available and consistent during a partition.

But certainly, if you only care about the uptime in A and not B, your'e fine.

And agreed upon the new DC-local quorum consistency level.

-- 
/ Peter Schuller

Re: Quorum and Datacenter loss

Posted by Dave Viner <da...@gmail.com>.

I think there's a flaw in your logic.  Take the following scenario:
- you use QUORUM for reads and QUROUM for writes
- you have 2 datacenters (DC1, DC2), with 3 servers in each (so 6 nodes
total).
- you set replication factor to 3
- you use RackAwareStrategy

So, you have DC1-S1, DC1-S2, DC1-S3, DC2-S4, DC2-S5, DC2-S6 as nodes. The
quorum count is N/2 +1.  Since RF=3, the quorum is 2.

When setup correctly, a write to any node for key K causes the data to be
replicated to 3 nodes.  Say it's DC1-S1, DC1-S2, DC2-S4.

When you read key K from any node, the coordinator node (the one you ask),
will attempt to contact DC1-S1, DC1-S2, DC2-S4, since those are the nodes
known to hold the value for key K.  The quorum required is 2 nodes, not all
3.  So, as long as the coordinator hears back from 2 of the nodes, it will
return.

In your example, imagine DC2 goes dark.  As long as DC1-S1 and DC1-S2 are
still responding, your read query will succeed.

Of course, you are on the knife-edge.  If you lost any machine in DC1 while
DC2 is out, then you would not be able to satisfy QUORUM reads or writes.

Note that in 0.7, the new LOCAL_QUORUM on reads would solve the issue, I
believe.

Dave Viner

On Sun, Dec 12, 2010 at 9:49 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Thanks a lot Peter.   So basically we would need to choose a
> > consistency other than QUORUM.    I think in our case consistency is
> > not necessarily an issue since our data is write-once, read-many
> > (immutable data).   I suppose having a replication factor of 4 would
> > result in two nodes in each datacenter having a copy of the data.   If
> > there's a flaw in my logic, please let me know : ]
>
> It would, but note that if you're writing at consistency level ONE
> only a single copy of the data is required to exist before your write
> is ACK:ed back to the client (but it will still be replicated).
>
> --
> / Peter Schuller
>

Re: Quorum and Datacenter loss

Posted by Peter Schuller <pe...@infidyne.com>.

> Thanks a lot Peter.   So basically we would need to choose a
> consistency other than QUORUM.    I think in our case consistency is
> not necessarily an issue since our data is write-once, read-many
> (immutable data).   I suppose having a replication factor of 4 would
> result in two nodes in each datacenter having a copy of the data.   If
> there's a flaw in my logic, please let me know : ]

It would, but note that if you're writing at consistency level ONE
only a single copy of the data is required to exist before your write
is ACK:ed back to the client (but it will still be replicated).

-- 
/ Peter Schuller

Re: Quorum and Datacenter loss

Posted by Jonathan Colby <jo...@gmail.com>.

Thanks a lot Peter.   So basically we would need to choose a
consistency other than QUORUM.    I think in our case consistency is
not necessarily an issue since our data is write-once, read-many
(immutable data).   I suppose having a replication factor of 4 would
result in two nodes in each datacenter having a copy of the data.   If
there's a flaw in my logic, please let me know : ]

On Sun, Dec 12, 2010 at 2:04 PM, Peter Schuller
<pe...@infidyne.com> wrote:
>>> Is my logic wrong here?  Is there a way to ensure the nodes in the
>>> alive datacenter respond successfully if the second datacenter is
>>> lost?  Anyone have experience with this kind of problem?
>>
>> It's impossible to achieve the consistency and availability at the
>> same time. See:
>
> (Assuming partition tolerance)
>
> Anyways, to expand a bit: The final consequence is that if you have a
> cluster that really does need QUORUM consistency, you won't be able to
> survive (in terms of availability, i.e., the cluster serving your
> traffic) data centers going down. If you want to continue operating in
> the case of a partition, you (1) cannot use QUORUM and (2) your
> application must be designed to work with and survive seeing
> inconsistent data.
>
> --
> / Peter Schuller
>

Re: Quorum and Datacenter loss

Posted by Peter Schuller <pe...@infidyne.com>.

>> Is my logic wrong here?  Is there a way to ensure the nodes in the
>> alive datacenter respond successfully if the second datacenter is
>> lost?  Anyone have experience with this kind of problem?
>
> It's impossible to achieve the consistency and availability at the
> same time. See:

(Assuming partition tolerance)

Anyways, to expand a bit: The final consequence is that if you have a
cluster that really does need QUORUM consistency, you won't be able to
survive (in terms of availability, i.e., the cluster serving your
traffic) data centers going down. If you want to continue operating in
the case of a partition, you (1) cannot use QUORUM and (2) your
application must be designed to work with and survive seeing
inconsistent data.

-- 
/ Peter Schuller

Re: Quorum and Datacenter loss

Posted by Peter Schuller <pe...@infidyne.com>.

> Is my logic wrong here?  Is there a way to ensure the nodes in the
> alive datacenter respond successfully if the second datacenter is
> lost?  Anyone have experience with this kind of problem?

It's impossible to achieve the consistency and availability at the
same time. See:

   http://en.wikipedia.org/wiki/CAP_theorem

-- 
/ Peter Schuller

Re: Quorum and Datacenter loss

Posted by Peter Schuller <pe...@infidyne.com>.

> It seems to me that with 2 datacenters, if one datacenter is lost,
> the  reads/writes to cassandra  will fail in the surviving datacenter
> because of the N/2 + 1 distribution of replicas.  In other words, you
> need more than half of the replicas to respond but in the case of a
> datacenter loss you would only ever get 1/2 to respond at best.

So just to go back to the original post: My interpretation of this was
that you wanted to survive in *BOTH* data centers, not just a primary
one. If the latter is the case, it is indeed possible as was just
pointed out (assuming no nodes are otherwise down in the primary data
center).


-- 
/ Peter Schuller