You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Philippe Dupont <pd...@teads.tv> on 2013/12/05 15:11:22 UTC

Replacing a Node using a Replication

Hi,
We currently have a 28 node C* cluster on m1.XLarge instances using Vnodes
and are encountering a Raid issue with one of them.

The first solution could be to decommission this node and insert a new one
in the cluster, since we use vnodes we need to run 28 cleanup after adding
a node, this value will increase as our cluster grow.

In theory, I would like to duplicate the defective node into a new one and
switch them without impacting the cluster : that would avoid the
decommission and all the streaming on the old node which could then be
instantly removed.

Is there any way to do this?

Thanks,

Philippe

Re: Replacing a Node using a Replication

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Dec 5, 2013 at 8:31 AM, Andre Sprenger <an...@getanet.de>wrote:

> We just migrated a Cassandra cluster on EC2 to another instance type. We
> replaced one server after another, this creates problems similar to what
> you describe.
>
> We  simply stop Cassandra, copy the complete data dir to an EBS volume,
> terminate the server, launch another server with the same IP, copy the data
> dir from the EBS volume and start Cassandra on the new server.
>

You don't even need to keep the same IP to do this operation.

https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

=Rob

Re: Replacing a Node using a Replication

Posted by Andre Sprenger <an...@getanet.de>.
We just migrated a Cassandra cluster on EC2 to another instance type. We
replaced one server after another, this creates problems similar to what
you describe.

We  simply stop Cassandra, copy the complete data dir to an EBS volume,
terminate the server, launch another server with the same IP, copy the data
dir from the EBS volume and start Cassandra on the new server.

Hinted handoff will write the updates that the replaced node has missed as
long as you finish within the max_hint_window_in_ms duration. We also
repaired the new node but this should not be necessary.




2013/12/5 Philippe Dupont <pd...@teads.tv>

> Hi,
> We currently have a 28 node C* cluster on m1.XLarge instances using Vnodes
> and are encountering a Raid issue with one of them.
>
> The first solution could be to decommission this node and insert a new one
> in the cluster, since we use vnodes we need to run 28 cleanup after
> adding a node, this value will increase as our cluster grow.
>
> In theory, I would like to duplicate the defective node into a new one and
> switch them without impacting the cluster : that would avoid the
> decommission and all the streaming on the old node which could then be
> instantly removed.
>
> Is there any way to do this?
>
> Thanks,
>
> Philippe
>