You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Juho Mäkinen <ju...@gmail.com> on 2010/07/21 22:30:40 UTC

Correct steps how to extend cluster size and RF

I'm just about to extend my current two node production cluster into
five node cluster and I'd like to be sure that my plan is correct.

Currently cluster has two nodes with RF=2. The target is to add four
nodes, increase RF to 3 and drop one of the old nodes.

My current plan is:
1) Add one node with RF=3 but keep the clients connecting only to the
two old nodes. As I'm doing many reads with ConsistencyLevel.ONE, this
should prevent the clients getting exceptions about missing keys.

2) Restart both old nodes with configuration that has RF=3. The
following inserts should now be propagated to the new 3rd node.

3) Execute "nodetool repair" on the new node. This should result that
now all three nodes have all the data.

4) Tell the clients they can now connect also to the new node.

5) Add the three remaining nodes, one at the time and wait that the
bootstrapping is completed. Also add the nodes to the client
connection list.

6) Execute "nodetool decomission"

7) Execute "nodetool loadbalance" to nodes if needed.

Can somebody spot any big problem with the plan?

I'm also thinking about the possibility to add one node to another
data center which would act as a live backup node. The idea would be
that all keys should have a copy in the backup machine. If I'm
correct, this can be done with RackAwareStrategy as stated in
Operation wiki page. No clients will be doing reads from this backup
machine. Is this even possible and if it is, would it be wise or
should I just do backups by snapshotting the cluster files as
suggested in Operation wiki page? I'm currently using
RackUnawareStrategy and I'm not even sure if it can be changed without
cluster downtime.

 - Juho Mäkinen

Re: Correct steps how to extend cluster size and RF

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Jul 21, 2010 at 1:30 PM, Juho Mäkinen <ju...@gmail.com> wrote:
> I'm just about to extend my current two node production cluster into
> five node cluster and I'd like to be sure that my plan is correct.
>
> Currently cluster has two nodes with RF=2. The target is to add four
> nodes, increase RF to 3 and drop one of the old nodes.
>
> My current plan is:
> 1) Add one node with RF=3 but keep the clients connecting only to the
> two old nodes. As I'm doing many reads with ConsistencyLevel.ONE, this
> should prevent the clients getting exceptions about missing keys.
>
> 2) Restart both old nodes with configuration that has RF=3. The
> following inserts should now be propagated to the new 3rd node.

At this point, CL.ONE reads to the old nodes will return no data for
1/3 of the reads (because they incorrectly believe that they have a
copy of all the data locally).

> 7) Execute "nodetool loadbalance" to nodes if needed.

It will be less painful (in terms of I/O and CPU consumed) to
calculate the right locations ahead of time and bring up the new nodes
with the appropriate InitialToken specified.

> I'm also thinking about the possibility to add one node to another
> data center which would act as a live backup node. The idea would be
> that all keys should have a copy in the backup machine. If I'm
> correct, this can be done with RackAwareStrategy as stated in
> Operation wiki page.

RackAwareStrategy will put a copy of *each* key in both datacenters.
Usually this is problematic if one DC has much less machines than the
other.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com