You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Hiller, Dean" <De...@nrel.gov> on 2013/02/28 23:39:02 UTC

-pr vs. no -pr

Isn't it true if I have 6 nodes, I could run nodetool repair on just 2 nodes(RF=3) instead of using nodetool repair –pr???

What is the advantage of –pr then?

I mean a repair involves all three nodes and pushes and pulls data, right?

Thanks,
Dean

Re: -pr vs. no -pr

Posted by Jim Cistaro <jc...@netflix.com>.

One other slight advantage of -prŠ

We sometimes have repairs that hang and need to be killed and restarted.
-pr means you have to "redo" a fraction of the work.

jc

-----Original Message-----
From: <Hiller>, Dean <De...@nrel.gov>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Friday, March 1, 2013 5:46 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: -pr vs. no -pr

>Sweeet, I %100 understand this now from these last few emails.  It has
>always been a bit confusing.
>
>Thanks,
>Dean
>
>From: Sylvain Lebresne <sy...@datastax.com>>
>Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
><us...@cassandra.apache.org>>
>Date: Friday, March 1, 2013 4:36 AM
>To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
><us...@cassandra.apache.org>>
>Subject: Re: -pr vs. no -pr
>
>On Thu, Feb 28, 2013 at 11:39 PM, Hiller, Dean
><De...@nrel.gov>> wrote:
>Isn't it true if I have 6 nodes, I could run nodetool repair on just 2
>nodes(RF=3) instead of using nodetool repair pr???
>
>Yes, it is true.
>
>And to precise further, in your case you have 2 options:
> 1) doing repair *without* -pr on 2 nodes (assuming you pick the correct
>2 nodes, it's *not* any 2 nodes)
> 2) doing a repair *with* -pr on the 6 nodes
>
>Both of those cases would 1) repair the full ring and 2) do the same
>amount of work.
>
>What is the advantage of pr then?
>
>As it happens, your case is a special case. You have a number of node
>that is a multiple of your replication factor. Now if that wasn't the
>case (say 5, 7 or 8 nodes with RF=3), then there is *no way* you can
>repair *without* -pr the whole cluster without doing *more* work than by
>doing a repair *with* -pr on all nodes.
>
>So the advantages of --pr (which btw, should be use for repair the whole
>cluster, not when you want to rebuild a specific node) are:
> 1) it always do the minimum of work, while repair without --pr is
>wasteful if the number of nodes is not a multiple of the replication
>factor (no matter how smart you are at scheduling the repairs).
> 2) even if your number of nodes is a multiple of the replication factor,
>you still have to make sure you pick the right N/RF nodes to repair if
>you don't use -pr. If you don't pick the correct ones, you will not
>repair the full ring. Using -pr is much more shoot-footing free: you have
>to run it on every node, period.
>
>--
>Sylvain
>

Re: -pr vs. no -pr

Posted by "Hiller, Dean" <De...@nrel.gov>.

Sweeet, I %100 understand this now from these last few emails.  It has always been a bit confusing.

Thanks,
Dean

From: Sylvain Lebresne <sy...@datastax.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, March 1, 2013 4:36 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: -pr vs. no -pr

On Thu, Feb 28, 2013 at 11:39 PM, Hiller, Dean <De...@nrel.gov>> wrote:
Isn't it true if I have 6 nodes, I could run nodetool repair on just 2 nodes(RF=3) instead of using nodetool repair –pr???

Yes, it is true.

And to precise further, in your case you have 2 options:
 1) doing repair *without* -pr on 2 nodes (assuming you pick the correct 2 nodes, it's *not* any 2 nodes)
 2) doing a repair *with* -pr on the 6 nodes

Both of those cases would 1) repair the full ring and 2) do the same amount of work.

What is the advantage of –pr then?

As it happens, your case is a special case. You have a number of node that is a multiple of your replication factor. Now if that wasn't the case (say 5, 7 or 8 nodes with RF=3), then there is *no way* you can repair *without* -pr the whole cluster without doing *more* work than by doing a repair *with* -pr on all nodes.

So the advantages of --pr (which btw, should be use for repair the whole cluster, not when you want to rebuild a specific node) are:
 1) it always do the minimum of work, while repair without --pr is wasteful if the number of nodes is not a multiple of the replication factor (no matter how smart you are at scheduling the repairs).
 2) even if your number of nodes is a multiple of the replication factor, you still have to make sure you pick the right N/RF nodes to repair if you don't use -pr. If you don't pick the correct ones, you will not repair the full ring. Using -pr is much more shoot-footing free: you have to run it on every node, period.

--
Sylvain

Re: -pr vs. no -pr

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Thu, Feb 28, 2013 at 11:39 PM, Hiller, Dean <De...@nrel.gov> wrote:

> Isn't it true if I have 6 nodes, I could run nodetool repair on just 2
> nodes(RF=3) instead of using nodetool repair –pr???
>

Yes, it is true.

And to precise further, in your case you have 2 options:
 1) doing repair *without* -pr on 2 nodes (assuming you pick the correct 2
nodes, it's *not* any 2 nodes)
 2) doing a repair *with* -pr on the 6 nodes

Both of those cases would 1) repair the full ring and 2) do the same amount
of work.

> What is the advantage of –pr then?

As it happens, your case is a special case. You have a number of node that
is a multiple of your replication factor. Now if that wasn't the case (say
5, 7 or 8 nodes with RF=3), then there is *no way* you can repair *without*
-pr the whole cluster without doing *more* work than by doing a repair
*with* -pr on all nodes.

So the advantages of --pr (which btw, should be use for repair the whole
cluster, not when you want to rebuild a specific node) are:
 1) it always do the minimum of work, while repair without --pr is wasteful
if the number of nodes is not a multiple of the replication factor (no
matter how smart you are at scheduling the repairs).
 2) even if your number of nodes is a multiple of the replication factor,
you still have to make sure you pick the right N/RF nodes to repair if you
don't use -pr. If you don't pick the correct ones, you will not repair the
full ring. Using -pr is much more shoot-footing free: you have to run it on
every node, period.

--
Sylvain

Re: -pr vs. no -pr

Posted by Tristan Seligmann <mi...@mithrandi.net>.

On Fri, Mar 1, 2013 at 12:39 AM, Hiller, Dean <De...@nrel.gov> wrote:
> Isn't it true if I have 6 nodes, I could run nodetool repair on just 2 nodes(RF=3) instead of using nodetool repair –pr???
>
> What is the advantage of –pr then?

I think the main advantage of nodetool is that you don't have to
calculate / determine which nodes to run repair on; you can just run
nodetool repair -pr on every node without doing redundant work. If you
run nodetool repair without -pr, then you need to run repair against
only every RFth node (every third node, in your case, which for 6
nodes total is 2 nodes, as you said) to avoid doing redundant work.
The end result is the same as far as I know.
-- 
mithrandi, i Ainil en-Balandor, a faer Ambar

Re: -pr vs. no -pr

Posted by Michael Theroux <mt...@yahoo.com>.

The way I've always thought about it is that -pr will make sure the information that specific node originates is consistent with its replicas.

So, we know that a node is responsible for a specific token range, and the next nodes in the ring will hold its replicas.  The -pr will make sure that a specific node's information is consistent to its replicas, but will not make sure a specific node has all the replicated information it can get from nodes previous to itself in the ring.

Without the -pr option, not only will the current node make sure its information and its replica's information is consistent, but it will also make sure that all the information that it is a replica for, is consistent.  

If you run regular repairs on all the nodes in your cluster, then -pr is sufficient.  Every node will run repair, and make sure its information is consistent with its replicas, eventually creating a fully consistent cluster.  This is a quicker process, and will have less impact on your operations by essentially spreading out the pain.  

For instance, we run a 12 node cluster.  We run "nodetool repair -pr" on nodes that are opposite to each other, 4 nodes a day (2 nodes in the morning, 2 nodes in the evening).  With a grace period of 10 days, this allows us to run repairs twice a week on a specific node, and to occasionally skip repairs on specific nodes once a week.  

In this case, without -pr, a lot of extra work would be done.  In fact, with an RF of 3 (in our case), the time per repair would increase many fold.

Another way to thing about it... although likely not 100% technically correct..

A repair -pr will cause a push of a node's information to its replicas.  Without the -pr, it will cause a push, and it will cause nodes it is a replica for to push their information as well.

-Mike

On Feb 28, 2013, at 9:39 PM, Hiller, Dean wrote:

> Isn't there more to it than that.  You really have nodes responsible for
> token ranges like so(using describe ring)
> 
> What we see is this from our describe ringŠ(1 to 6 are token ranges while
> A to F are servers)Š.
> A - 1, 2, 3
> B - 2, 3, 4
> C - 3, 4, 5
> D - 4, 5, 6
> E - 5, 6, 1
> F - 6, 1, 2
> 
> With -pr, only token range 1 is repaired I think, right?  2 and 3 are only
> repaired without the -pr option?  This means if I have a node that I just
> joined the cluster, I should "not" be using -pr as 2 and 3 on node A will
> not be up to date.  Using -pr is nice if I am going to repair every single
> node and is nice for the cron job that has to happen before
> gc_grace_seconds.  Am I wrong here?  Ie. -pr is really only good for use
> in the cron job as it would miss 2 and 3 above.  I could run the cron on
> just two servers but then my nodes are different which can be a hassle.
> 
> Please verify that is what you believe is what happens as well?
> 
> Thanks,
> Dean
> 
> On 2/28/13 5:58 PM, "Takenori Sato(Cloudian)" <ts...@cloudian.com> wrote:
> 
>> Hi,
>> 
>> Please note that I confirmed on v1.0.7.
>> 
>>> I mean a repair involves all three nodes and pushes and pulls data,
>> right?
>> 
>> Yes, but that's how -pr works. A repair without -pr does more.
>> 
>> For example, suppose you have a ring with RF=3 like this.
>> 
>> A - B - C - D - E - F
>> 
>> Then, a repair on A without -pr does for 3 ranges as follows:
>> [A, B, C]
>> [E, F, A]
>> [F, A, B]
>> 
>> Among them, the first one, [A, B, C] is the primary range of A.
>> 
>> So, with -pr, a repair runs only for:
>> [A, B, C]
>> 
>>> I could run nodetool repair on just 2 nodes(RF=3) instead of using
>> nodetool repair pr???
>> 
>> Yes.
>> 
>> You need to run two repairs on A and D.
>> 
>>> What is the advantage of pr then?
>> 
>> Whenever you want to minimize rapair impacts.
>> 
>> For example, suppose you got one node down for a while, and bring it
>> back to the cluster.
>> 
>> You need to run rapair without affecting the entire cluster. Then, -pr
>> is the option.
>> 
>> Thanks,
>> Takenori
>> 
>> (2013/03/01 7:39), Hiller, Dean wrote:
>>> Isn't it true if I have 6 nodes, I could run nodetool repair on just 2
>>> nodes(RF=3) instead of using nodetool repair pr???
>>> 
>>> What is the advantage of pr then?
>>> 
>>> I mean a repair involves all three nodes and pushes and pulls data,
>>> right?
>>> 
>>> Thanks,
>>> Dean
>> 
>

Re: -pr vs. no -pr

Posted by "Hiller, Dean" <De...@nrel.gov>.

Isn't there more to it than that.  You really have nodes responsible for
token ranges like so(using describe ring)

What we see is this from our describe ringŠ(1 to 6 are token ranges while
A to F are servers)Š.
A - 1, 2, 3
B - 2, 3, 4
C - 3, 4, 5
D - 4, 5, 6
E - 5, 6, 1
F - 6, 1, 2

With -pr, only token range 1 is repaired I think, right?  2 and 3 are only
repaired without the -pr option?  This means if I have a node that I just
joined the cluster, I should "not" be using -pr as 2 and 3 on node A will
not be up to date.  Using -pr is nice if I am going to repair every single
node and is nice for the cron job that has to happen before
gc_grace_seconds.  Am I wrong here?  Ie. -pr is really only good for use
in the cron job as it would miss 2 and 3 above.  I could run the cron on
just two servers but then my nodes are different which can be a hassle.

Please verify that is what you believe is what happens as well?

Thanks,
Dean

On 2/28/13 5:58 PM, "Takenori Sato(Cloudian)" <ts...@cloudian.com> wrote:

>Hi,
>
>Please note that I confirmed on v1.0.7.
>
> > I mean a repair involves all three nodes and pushes and pulls data,
>right?
>
>Yes, but that's how -pr works. A repair without -pr does more.
>
>For example, suppose you have a ring with RF=3 like this.
>
>A - B - C - D - E - F
>
>Then, a repair on A without -pr does for 3 ranges as follows:
>[A, B, C]
>[E, F, A]
>[F, A, B]
>
>Among them, the first one, [A, B, C] is the primary range of A.
>
>So, with -pr, a repair runs only for:
>[A, B, C]
>
> > I could run nodetool repair on just 2 nodes(RF=3) instead of using
>nodetool repair pr???
>
>Yes.
>
>You need to run two repairs on A and D.
>
> > What is the advantage of pr then?
>
>Whenever you want to minimize rapair impacts.
>
>For example, suppose you got one node down for a while, and bring it
>back to the cluster.
>
>You need to run rapair without affecting the entire cluster. Then, -pr
>is the option.
>
>Thanks,
>Takenori
>
>(2013/03/01 7:39), Hiller, Dean wrote:
>> Isn't it true if I have 6 nodes, I could run nodetool repair on just 2
>>nodes(RF=3) instead of using nodetool repair pr???
>>
>> What is the advantage of pr then?
>>
>> I mean a repair involves all three nodes and pushes and pulls data,
>>right?
>>
>> Thanks,
>> Dean
>

Re: -pr vs. no -pr

Posted by "Takenori Sato(Cloudian)" <ts...@cloudian.com>.

Hi,

Please note that I confirmed on v1.0.7.

 > I mean a repair involves all three nodes and pushes and pulls data, 
right?

Yes, but that's how -pr works. A repair without -pr does more.

For example, suppose you have a ring with RF=3 like this.

A - B - C - D - E - F

Then, a repair on A without -pr does for 3 ranges as follows:
[A, B, C]
[E, F, A]
[F, A, B]

Among them, the first one, [A, B, C] is the primary range of A.

So, with -pr, a repair runs only for:
[A, B, C]

 > I could run nodetool repair on just 2 nodes(RF=3) instead of using 
nodetool repair –pr???

Yes.

You need to run two repairs on A and D.

 > What is the advantage of –pr then?

Whenever you want to minimize rapair impacts.

For example, suppose you got one node down for a while, and bring it 
back to the cluster.

You need to run rapair without affecting the entire cluster. Then, -pr 
is the option.

Thanks,
Takenori

(2013/03/01 7:39), Hiller, Dean wrote:
> Isn't it true if I have 6 nodes, I could run nodetool repair on just 2 nodes(RF=3) instead of using nodetool repair –pr???
>
> What is the advantage of –pr then?
>
> I mean a repair involves all three nodes and pushes and pulls data, right?
>
> Thanks,
> Dean