You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Brian Bulkowski <br...@bulkowski.org> on 2009/10/13 23:52:31 UTC

eventual consistency question

Greetings,

I'm evaluating Cassandra, like others. I've scrubbed through the mail 
digest and blog posts and whatnot, and I've seen my question asked but 
I'm not clear on the answers.

I'm doing what others have done: using 3 servers and doing a few test 
inserts to understand the data and consistency model.

Question 1:
    the bootstrap parameter: what does it do, exactly?
    It seems the right thing to do, just playing around, is to start the 
first node with no bootstrap, and the other two with bootstrap.
    But I don't know the hows or whys.

Question 2:
    "how eventual is eventual?"
    Imagine the following case:
       Defaults from storage-conf.xml + replication count 2 (and the IP 
addresses required, etc)
       Up server A (no -b)
       Insert a few values, read, all is good (using _cli)
       Up server B, C (with -b)
       read values from A, B, or C - all is good, appears to be reading 
from A
       wait a few minutes - servers appear quiescent.
       Down server A
       read values from B - values are not available (NPE exception on 
server & _cli interface)

So I read that Cassandra doesn't optimistically replicate, so I 
understand in theory that the data inserted to A shouldn't replicate.
I believe if I used the proper thrift inteface and asked for replication 
count 2, the transaction would have failed.
Yet, I expect that if I asked for replication count 2, I should get it. 
At some point. Eventually. The data has been inserted.
I expect the cluster to work toward replication count 2 regardless of 
the current state of the cluster --- is there a way to achieve this 
behavior?

Question 3:
    "balancing"
       This question is similar to question 2, from a different way.
       I have three nodes which I brought up at the dawn of time. 
They've taken a lot of inserts, and have 1T each.
       Let's say the load now is mostly reads, as the data has already 
been inserted
       I bring up a fourth node.
       Clients (aka app servers) are pointing at the first 3 nodes. I 
have to reconfigure those servers to start using the 4th server, right?
       New writes may take advantage of the 4th server, but no data will 
automatically move?
       Which would mean that the servers would be out of balance, 
perhaps for a long time, perhaps forever?

Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't 
want to foolishly misrepresent it.

Thanks,
-brianb

Re: eventual consistency question

Posted by Jonathan Ellis <jb...@gmail.com>.

On Tue, Oct 13, 2009 at 4:52 PM, Brian Bulkowski <br...@bulkowski.org> wrote:
> Question 1:
>   the bootstrap parameter: what does it do, exactly?

It's for adding nodes to an existing cluster.  (This is being reworked
to be more automatic for 0.5.)  If you start a node without it, it
assumes it already has the data that's "supposed" to be on it (either
b/c you are starting a new cluster, or restarting an existing node in
one).  If you do specify -b it will contact the cluster you're
starting it in to move the right data to it.

> Question 2:
>   "how eventual is eventual?"
>   Imagine the following case:
>      Defaults from storage-conf.xml + replication count 2 (and the IP
> addresses required, etc)
>      Up server A (no -b)
>      Insert a few values, read, all is good (using _cli)
>      Up server B, C (with -b)
>      read values from A, B, or C - all is good, appears to be reading from A
>      wait a few minutes - servers appear quiescent.
>      Down server A
>      read values from B - values are not available (NPE exception on server
> & _cli interface)
>
> So I read that Cassandra doesn't optimistically replicate, so I understand
> in theory that the data inserted to A shouldn't replicate.
> I believe if I used the proper thrift inteface and asked for replication
> count 2, the transaction would have failed.
> Yet, I expect that if I asked for replication count 2, I should get it. At
> some point. Eventually. The data has been inserted.
> I expect the cluster to work toward replication count 2 regardless of the
> current state of the cluster --- is there a way to achieve this behavior?

There's a couple things going on here.

The big one is that after a fresh start A doesn't know what other
nodes "should" be part of the cluster.  Cassandra does assume that you
reach a "good" state before bad stuff starts happening (in production
this turns out to be quite reasonable).  So what you need to do is
bring up A/B/C, then turn things off, rather than just bring up A by
itself to start with.

The second one is that there's a different time scale for "eventually"
between "eventually, all the replicas are in sync" and "eventually,
the failure detector will notice that a node is down and not route
requests to it."  The first is in ms (with a few caveats mostly the
same as in the Dynamo paper), the second is in seconds (a handful for
a small cluster, maybe 15 for a large one).

> Question 3:
>   "balancing"
>      This question is similar to question 2, from a different way.
>      I have three nodes which I brought up at the dawn of time. They've
> taken a lot of inserts, and have 1T each.
>      Let's say the load now is mostly reads, as the data has already been
> inserted
>      I bring up a fourth node.
>      Clients (aka app servers) are pointing at the first 3 nodes. I have to
> reconfigure those servers to start using the 4th server, right?

Depends on your infrastructure.  I'm a fan of round-robin DNS.  Or you
can use a sw or hw load balancer.  Or you can ask the cluster what
machines are in it and manually balance from your client app, but that
is my least favorite option.

>      New writes may take advantage of the 4th server, but no data will
> automatically move?

If you specify -b when starting the 4th node, the right data will be
copied to it.  (Then, you need to manually tell the other nodes to
"cleanup" -- remove data that doesn't belong on them.  This is not
automatic since if the nodes are "missing" as in the answer to #2 you
can shoot yourself in the foot here.)

> Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't
> want to foolishly misrepresent it.

These are not dumb questions.  Carry on. :)

-Jonathan