You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Paul Pak <pp...@yellowseo.com> on 2011/03/07 21:28:44 UTC

recommended way to grow a cluster?

Hello,

I'm doing some testing of Cassandra and I've read a lot about people
running into situations growing their clusters.  So, I'm about to test
it with .7.3.  I've got a test node which is a single node with a
replication factor of 1.  I'd like to grow it to 3 nodes and a
replication factor of 2.  Then, I'd like to grow the cluster to 10 nodes
with a replication factor of 3.

I'm trying to determine a few things.
a) What kind of load it will put on the cluster that is "in use"
b) How long will it take?
c) Are there any gotchas in the process?
d) How easy/difficult will this be?

If anyone has any experience with growing clusters, please share.    If
someone can give an idea of what I need to do for growing the clusters
properly, I'll be happy to do it and report back anything I find.

Paul

Re: recommended way to grow a cluster?

Posted by Peter Schuller <pe...@infidyne.com>.
> - When adding nodes to a cluster it's mode efficient if you can change the
> range to existing nodes to be a sub set of what they were responsible for
> previously. So the node only has to stream out data, rather than stream out
> and stream in data. Say you have this contrived example (where values are

Also, doubling is usually the easiest thing to do because it only
involves inserting new nodes at appropriate places. Any increase that
involves moving nodes is a bit more of an issue because moving a node
implies decommission+bootstrap. If your cluster is not under
significant load it is not a huge problem, but if you're trying to
execute a cluster expansion live with significant amounts of live
traffic, you may not want to remove any of your existing nodes even
temporarily.

So, if possible, I'd recommend doubling. You can get around the
problem of move being a decommission+bootstrap by jumping through some
extra hoops (essentially using an 'extra' node so you can do
insertions followed by removals instead of moves), but it's more of a
hassle.

-- 
/ Peter Schuller

Re: recommended way to grow a cluster?

Posted by aaron morton <aa...@thelastpickle.com>.
I do not know of any articles I could send your way, and others may have some tales from running production systems. But here are a few thoughts, others please correct me if I am wrong:

- the replication factor is not intended to the changed on a running system. It can be, but it will be a heavy weight process http://wiki.apache.org/cassandra/Operations#Replication

- When adding nodes to a cluster it's mode efficient if you can change the range to existing nodes to be a sub set of what they were responsible for previously. So the node only has to stream out data, rather than stream out and stream in data. Say you have this contrived example (where values are real numbers between 1 and 10) :
	- node a -> values 1 to 3
	- node b -> values 4 to 6
	- node c -> values 7 to 10

	And you want to add node d:
	- if you add node d to handle values between 2 and 3, you can stream node a's data over and then delete data it is no longer responsible for. 
	- If you want a more balanced ring, you may want to change the all the ranges to be:
		- node a -> values 1 to 2.5
		- node b -> values 2.5 to 5.0
		- node c -> values 5.0 to 7.5
		- node d -> values 7.5 to 10
	In this case there are a lot of moves, for example node b has to both send data to node c and get data from node b. 

	AFAIK the easier path when growing is to double the number of nodes. Cassandra does support more complicates moves but they may require a lot of resources. How this impacts your system depends on load, data size and IO capacity. 

Hope that helps. 
Aaron

On 8/03/2011, at 9:28 AM, Paul Pak wrote:

> Hello,
> 
> I'm doing some testing of Cassandra and I've read a lot about people
> running into situations growing their clusters.  So, I'm about to test
> it with .7.3.  I've got a test node which is a single node with a
> replication factor of 1.  I'd like to grow it to 3 nodes and a
> replication factor of 2.  Then, I'd like to grow the cluster to 10 nodes
> with a replication factor of 3.
> 
> I'm trying to determine a few things.
> a) What kind of load it will put on the cluster that is "in use"
> b) How long will it take?
> c) Are there any gotchas in the process?
> d) How easy/difficult will this be?
> 
> If anyone has any experience with growing clusters, please share.    If
> someone can give an idea of what I need to do for growing the clusters
> properly, I'll be happy to do it and report back anything I find.
> 
> Paul