You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Daniel Doubleday <da...@gmx.net> on 2010/12/09 20:01:07 UTC

Stuck with adding nodes

Hi good people.

I underestimated load during peak times and now I'm stuck with our production cluster. 
Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster

The problem derives from our quorum read / writes: At peak hours one of the machines (thats random) will fall behind because its a little slower than the others and than shortly after that it will drop most read requests. So right now the only way to survive is to take one machine down making every read / write a ALL operation. It's necessary to take one machine down because otherwise users will wait for timeouts from that overwhelmed machine when the client lib chooses it. Since we are a real time oriented thing thats a killer.

So now we tried to add 2 more nodes. Problem is that anticompaction takes to long. Meaning it is not done when peak hour arrives and the machine that would stream the data to the new node must be taken down. We tried to block the ports 7000 and 9160 to that machine because we hoped that would stop traffic and let the machine end anticompaction. But that did not work because we could not cut the already existing connections to the other nodes.

Currently I am copying all data files (thats all existing data) from one node to the new nodes in hope that I could than manually assign them their new tokenrange (nodetool move) and do cleanup.

Obviously I will try this tomorrow (it's been a long day) on a test system but any advice would be highly appreciated.

Sighs and thanks.
Daniel

smeet.com
Berlin

Re: Stuck with adding nodes

Posted by Daniel Doubleday <da...@gmx.net>.

Thanks for your help Peter.

We gave up and rolled back to our mysql implementation (we did all writes to our old store in parallel so we did not lose anything).
Problem was that every solution we came up with would require at least on major compaction before the new nodes could join and our cluster could not survive this (in terms of serving requests at reasonable latencies).

But thanks anyway,
Daniel

On Dec 9, 2010, at 8:25 PM, Peter Schuller wrote:

>> Currently I am copying all data files (thats all existing data) from one node to the new nodes in hope that I could than manually assign them their new tokenrange (nodetool move) and do cleanup.
> 
> Unless I'm misunderstanding you I believe you should be setting the
> initial token. nodetool move would be for a node already in the ring.
> And keep in mind that a nodetool move is currently a
> decommission+bootstrap - so if you're teetering on the edge of
> overload you will want to keep that in mind when moving a node to
> avoid ending up in a worse situation as another node temporarily
> receives more load than usual as a result of increased ring ownership.
> 
>> Obviously I will try this tomorrow (it's been a long day) on a test system but any advice would be highly appreciated.
> 
> One possibility if you have additional hardware to spare temporarily,
> is to add more nodes than you actually need and then, once you are
> significantly over capacity, you have the flexibility to move nodes
> around to an optimum position and then decommission those machines
> that were only "borrowed". I.e., initial bootstrap of nodes takes a
> shorter amount of time because you're giving them less token space per
> new node. And once all are in the ring, you're free to move things
> around and then free up the hardware.
> 
> (Another option may be to implement throttling of the anti-compaction
> so that it runs very slowly during peak hours, but that requires
> patching cassandra or else firewall/packet filtering fu and is
> probably likely to be more risky than it's worth.)
> 
> -- 
> / Peter Schuller

Re: Stuck with adding nodes

Posted by Peter Schuller <pe...@infidyne.com>.

> Currently I am copying all data files (thats all existing data) from one node to the new nodes in hope that I could than manually assign them their new tokenrange (nodetool move) and do cleanup.

Unless I'm misunderstanding you I believe you should be setting the
initial token. nodetool move would be for a node already in the ring.
And keep in mind that a nodetool move is currently a
decommission+bootstrap - so if you're teetering on the edge of
overload you will want to keep that in mind when moving a node to
avoid ending up in a worse situation as another node temporarily
receives more load than usual as a result of increased ring ownership.

> Obviously I will try this tomorrow (it's been a long day) on a test system but any advice would be highly appreciated.

One possibility if you have additional hardware to spare temporarily,
is to add more nodes than you actually need and then, once you are
significantly over capacity, you have the flexibility to move nodes
around to an optimum position and then decommission those machines
that were only "borrowed". I.e., initial bootstrap of nodes takes a
shorter amount of time because you're giving them less token space per
new node. And once all are in the ring, you're free to move things
around and then free up the hardware.

(Another option may be to implement throttling of the anti-compaction
so that it runs very slowly during peak hours, but that requires
patching cassandra or else firewall/packet filtering fu and is
probably likely to be more risky than it's worth.)

--
/ Peter Schuller