You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Robert Wille <rw...@fold3.com> on 2015/01/23 19:03:56 UTC

nodetool repair options

nodetool repair has some options that I don’t understand. Reading the documentation doesn’t exactly make things more clear. I’m running a 2.0.11 cluster with vnodes and a single data center.

The docs say "Use -pr to repair only the first range returned by the partitioner”. What does this mean? Why would I only want to repair the first range?

What are the tradeoffs of a parallel versus serial repair?

What are the recommended options for regular, periodic repair?
 
Thanks

Robert

Re: nodetool repair options

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Jan 23, 2015 at 10:03 AM, Robert Wille <rw...@fold3.com> wrote:

> The docs say "Use -pr to repair only the first range returned by the
> partitioner”. What does this mean? Why would I only want to repair the
> first range?
>

If you're repairing the whole cluster, repairing only the primary range on
each node avoids avoiding once per replication factor.

> What are the tradeoffs of a parallel versus serial repair?
>

Parallel repair affects all replicas simultaneously and can thereby degrade
latency for that replica set. Serial repair doesn't, but is serial and
intensely slower. Serial repair is probably not usable at all with RF>5 or
so, unless you set an extremely long gc_grace_seconds.

> What are the recommended options for regular, periodic repair?
>

(Snapshot/incremental repair, default IIRC in newer Cassandra, changes many
of these assumptions. I refer to "old-style" nodetool repair with my
statements.)

The canonical response is repair the entire cluster with -pr once per
gc_grace_seconds.

Regarding frequent repair... consider your RF, CL and whether you actually
care about consistency and durability for any given colunfamily. If you
never do DELETE-like-operations (in CQL, this includes things other than
DELETE statements) in the CF, probably don't repair it just for consistency
purposes.

Then, consider how long you can tolerate DELETEd data sticking around. If
you can tolerate it because you don't DELETE much data, set
gc_grace_seconds to at least 34 days. With 34 days, you can begin a repair
on the first of the month and have between 3 and 7 days for it to complete.
You repair for up to a few days in order to repair a month's data. With
shorter repair cycles, you pay the relatively high cost of repair
repeatedly.

Last, consider your Cassandra version. Newer versions have had significant
focus on streaming and repair stability and performance. Upgrade to the
HEAD of 2.0.x if possible.

There's this thing I jokingly call the Coli Conjecture, which says that if
you're in a good case for Cassandra you probably don't actually don't care
about consistency or durability, even if you think you do. This comes from
years of observing consistency edge cases in Cassandra and noticing that
even very few people who detected them and reported them seemed to
experience very negative results from the perspective of their application.
I think it is an interesting observation and a different mindset for many
people coming from the non-distributed, normalized, relational world.

=Rob