You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ben Kaehne <be...@sirca.org.au> on 2012/08/09 03:58:40 UTC

Syncing nodes + Cassandra Data Availability

Good morning,

Our application runs on a 3 node cassandra cluster with RF of 3.

We use quorum operations against this cluster in hopes of garunteeing
consistency.

One scenario in which an issue can occur here is:
Out of our 3 nodes, only 2 are up.
We perform a write to say, a new key.
The down node is started again, at the same time, a different node is
brought offline.
At this point. The data we have written above is on one node, but not the
other online node. Meaning quorum reads will fail.

Surely other people have encountered such issue before.

We disabled hinted handoffs originally as to not have to worry about race
conditions of disk space on servers filling up due to piling up handoffs.
Although perhaps this may somewhat aid the situation (although from what I
read, it does not completely remedy the circumstance).

If so, how are you dealing with it?
>From what I understand a read repair (in which we have set to 1.0) will
only be performed on a successful read occurs, in which will not happen
here.

nodetool repair seems rather slow, is manual and does not suit our
situation where data has to be available apon demand.

Regards,

-- 
-Ben

Re: Syncing nodes + Cassandra Data Availability

Posted by Tyler Hobbs <ty...@datastax.com>.

On Wed, Aug 8, 2012 at 8:58 PM, Ben Kaehne <be...@sirca.org.au> wrote:

>
>
> Our application runs on a 3 node cassandra cluster with RF of 3.
>
> We use quorum operations against this cluster in hopes of garunteeing
> consistency.
>
> One scenario in which an issue can occur here is:
> Out of our 3 nodes, only 2 are up.
> We perform a write to say, a new key.
> The down node is started again, at the same time, a different node is
> brought offline.
> At this point. The data we have written above is on one node, but not the
> other online node. Meaning quorum reads will fail.
>

So only one of the three nodes are up?  The data should be written to two
nodes (since your quorum write succeeded), one node that is up, and one
that is down.

>
> Surely other people have encountered such issue before.
>
> We disabled hinted handoffs originally as to not have to worry about race
> conditions of disk space on servers filling up due to piling up handoffs.
> Although perhaps this may somewhat aid the situation (although from what I
> read, it does not completely remedy the circumstance).
>

Hints stop being stored after a node has been down for a while (I believe
the default is 1 hour, but it's configurable through cassandra.yaml), so
you shouldn't have to worry about running out of disk space.  Hinted
handoff is definitely the fastest way to restore consistency, and it will
catch almost all cases in Cassandra 1.1 and later.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>