You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Katriel Traum <ka...@google.com> on 2014/04/24 15:21:30 UTC

Impact of adding nodes to cluster

Hello list,

I have a cluster of 3 nodes with RF=3. The cluster load is daily bulk
write/delete/compact, and read the rest of the time.
For better read performance, and to make sure data is 100% consistent, we
write with "ALL" and read "ONE", stopping the write process if there is a
problem.

My problem is that I've maxed out my network cards. I do not have separate
cards for inter-node communication.
The knee jerk reaction was adding more nodes, but I'm not sure what
replication factor to set, 3 or 6.
My first thought was to leave it at 3, but having 6 nodes, means that if
the co-ordinator does not have the data, it will read it from another node:
1. Data Node -> Coordinator
2. Coordinator -> Client
This effectively means that I double the amount of data going through the
system:

Setting it to RF=6, means that every node will always have a replica of the
data, and reading with "ONE", will always be contained in the co-ordinator.
The downside is I loose the added value of redundancy during the write
cycle.

Does anyone have an insight or idea if my assumptions are correct? Does
inter-node communication really add all this network overhead?

Thanks,
Katriel

Re: Impact of adding nodes to cluster

Posted by John Pyeatt <jo...@singlewire.com>.
Leave the RF at 3. Especially since you  use write ALL consistency. It's
actually a really bad idea to have your RF set to the same value as the
number of nodes you have. If one of your nodes goes down, your writes will
fail. In fact I would suggest leaving your RF at 3 and set read and write
consistency to quorum.


On Thu, Apr 24, 2014 at 8:21 AM, Katriel Traum <ka...@google.com> wrote:

> Hello list,
>
> I have a cluster of 3 nodes with RF=3. The cluster load is daily bulk
> write/delete/compact, and read the rest of the time.
> For better read performance, and to make sure data is 100% consistent, we
> write with "ALL" and read "ONE", stopping the write process if there is a
> problem.
>
> My problem is that I've maxed out my network cards. I do not have separate
> cards for inter-node communication.
> The knee jerk reaction was adding more nodes, but I'm not sure what
> replication factor to set, 3 or 6.
> My first thought was to leave it at 3, but having 6 nodes, means that if
> the co-ordinator does not have the data, it will read it from another node:
> 1. Data Node -> Coordinator
> 2. Coordinator -> Client
> This effectively means that I double the amount of data going through the
> system:
>
> Setting it to RF=6, means that every node will always have a replica of
> the data, and reading with "ONE", will always be contained in the
> co-ordinator. The downside is I loose the added value of redundancy during
> the write cycle.
>
> Does anyone have an insight or idea if my assumptions are correct? Does
> inter-node communication really add all this network overhead?
>
> Thanks,
> Katriel
>
>


-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
------------------
608.661.1184
john.pyeatt@singlewire.com