You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Mohit Anchlia <mo...@gmail.com> on 2013/08/03 00:28:27 UTC

Automated Repair on multiple nodes

We currently run automated repairs sequentially on all the nodes. However,
as we grow the cluster we now need to run repair on multiple nodes in
parallel to be able to finish it withing gcgrace seconds. Before I write
the script I was wondering if somebody already has a tool or a script that
figures out nodes that we can safely run repairs on in parallel. For
instance we wouldn't run repair on replica nodes in parallel.

Re: Automated Repair on multiple nodes

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Aug 2, 2013 at 3:28 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> We currently run automated repairs sequentially on all the nodes. However,
> as we grow the cluster we now need to run repair on multiple nodes in
> parallel to be able to finish it withing gcgrace seconds.
>
Or you could just increase gc_grace_seconds from the arbitrary and IMO
unreasonably low default of 10 days.

> Before I write the script I was wondering if somebody already has a tool
> or a script that figures out nodes that we can safely run repairs on in
> parallel. For instance we wouldn't run repair on replica nodes in parallel.
>
This will only really work with non-virtual nodes, if you repair
hardware-node-wide. With 256 virtual nodes per node, your repair overhead
will also be evenly distributed.

Someone has probably written the script, but if I were you I would consider
whether you really want to monitor N/RF fragile and independent repair
sessions simultaneously before using such a script.

=Rob