You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by KZ Win <kz...@pelotoncycle.com> on 2014/08/01 11:46:22 UTC

how do i know if nodetool repair is finished

I have a 2 node apache cassandra (2.0.3) cluster with rep factor of 1. I
change rep factor to 2 using the following command in cqlsh

ALTER KEYSPACE "mykeyspace" WITH REPLICATION =   { 'class' :
'SimpleStrategy', 'replication_factor' : 2 };

I then tried to run recommended "nodetool repair" after doing this type of
alter.

The problem is that this command sometimes finishes very quickly. When it
does finishes like that it will normally say 'Lost notification...' and
exit code is not zero.

So I just repeat this 'nodetool repair' until it finishes without error. I
also check that 'nodetool status' reports expected disk space for each
node. (with rep factor 1, each node has say about 7GB each and I expect
after nodetool repair that each is 14GB each assuming no cluster usage in
the mean time)

Is there a more correct way to determine that 'nodetool repair' is finished
in this case?

how do i know if nodetool repair is finished

Posted by KZ Win <kz...@pelotoncycle.com>.

Thanks for great information.

Is it generally safe to accept read and write to the clusters while the
repair is going on?  I accept that the performance may be low
during this time.

Is it also generally safe to add another node to this cluster with higher
rep factor before the repair is not finished?

k.z.


On Fri, Aug 1, 2014 at 1:06 PM, Aiman Parvaiz <aiman@shift.com
<javascript:;>> wrote:
> This is a old post, am not sure if something changed for new C* versions.
>
> If nodetool compactionstats says there are no Validation compactions
running
> (and the compaction queue is empty)  and netstats says there is nothing
> streaming there is a a good chance the repair is finished or dead. If a
> neighbour dies during a repair the node it was started on will wait for 48
> hours(?) until it times out. Check the logs on the machines for errors,
> particularly from the AntiEntropyService. And see what compactionstats is
> saying on all the nodes involved in the repair.
>
> source:
>
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html
>
>
> On Aug 1, 2014, at 2:46 AM, KZ Win <kzwin@pelotoncycle.com <javascript:;>>
wrote:
>
> I have a 2 node apache cassandra (2.0.3) cluster with rep factor of 1. I
> change rep factor to 2 using the following command in cqlsh
>
> ALTER KEYSPACE "mykeyspace" WITH REPLICATION =   { 'class' :
> 'SimpleStrategy', 'replication_factor' : 2 };
>
> I then tried to run recommended "nodetool repair" after doing this type of
> alter.
>
> The problem is that this command sometimes finishes very quickly. When it
> does finishes like that it will normally say 'Lost notification...' and
exit
> code is not zero.
>
> So I just repeat this 'nodetool repair' until it finishes without error. I
> also check that 'nodetool status' reports expected disk space for each
node.
> (with rep factor 1, each node has say about 7GB each and I expect after
> nodetool repair that each is 14GB each assuming no cluster usage in the
mean
> time)
>
> Is there a more correct way to determine that 'nodetool repair' is
finished
> in this case?
>
>

Re: how do i know if nodetool repair is finished

Posted by Aiman Parvaiz <ai...@shift.com>.

This is a old post, am not sure if something changed for new C* versions.

If nodetool compactionstats says there are no Validation compactions
running (and the compaction queue is empty) and netstats says there is
nothing streaming there is a a good chance the repair is finished or dead.
If a neighbour dies during a repair the node it was started on will wait
for 48 hours(?) until it times out. Check the logs on the machines for
errors, particularly from the AntiEntropyService. And see what
compactionstats is saying on all the nodes involved in the repair.

source:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html

On Aug 1, 2014, at 2:46 AM, KZ Win <kz...@pelotoncycle.com> wrote:

I have a 2 node apache cassandra (2.0.3) cluster with rep factor of 1. I
change rep factor to 2 using the following command in cqlsh

ALTER KEYSPACE "mykeyspace" WITH REPLICATION = { 'class' :
'SimpleStrategy', 'replication_factor' : 2 };

I then tried to run recommended "nodetool repair" after doing this type of
alter.

The problem is that this command sometimes finishes very quickly. When it
does finishes like that it will normally say 'Lost notification...' and
exit code is not zero.

So I just repeat this 'nodetool repair' until it finishes without error. I
also check that 'nodetool status' reports expected disk space for each
node. (with rep factor 1, each node has say about 7GB each and I expect
after nodetool repair that each is 14GB each assuming no cluster usage in
the mean time)

Is there a more correct way to determine that 'nodetool repair' is finished
in this case?