You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edward Sargisson <ed...@globalrelay.net> on 2012/08/28 01:52:51 UTC

Automating nodetool repair

Hi all,
So nodetool repair has to be run regularly on all nodes. Does anybody 
have any interesting strategies or tools for doing this or is everybody 
just setting up cron to do it?

For example, one could write some Puppet code to splay the cron times 
around so that only one should be running at once.
Or, perhaps, a central orchestrator that is given some known quiet time 
and works its way through the list, running nodetool repair one at a 
time (using RPC?) until it runs out of time.

Cheers,
Edward
-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*--- *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.


Re: Automating nodetool repair

Posted by aaron morton <aa...@thelastpickle.com>.
Staggering the repairs also gives the DynamicSnitch a chance to route around nodes which maybe running slow.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/08/2012, at 11:19 AM, Omid Aladini <om...@gmail.com> wrote:

>>> Secondly, what's the need for sleep 120?
>> 
>> just give the cluster a chance to settle down between repairs...
>> there's no real need for it, just is there "because".
> 
> Actually, repair could cause unreplicated data to be streamed and new
> sstables to be created. New sstables could cause pending compactions
> and increase the potential number of sstables a row could be spread
> across. Therefore you might need more disk seeks to read a row and
> have slower read response time. If the read response time is critical,
> it's a good idea to wait for pending compactions to settle before
> repairing other neighbouring ranges that overlap replicas.
> 
> -- Omid
> 
>> --
>> Aaron Turner
>> http://synfin.net/         Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
>> Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>>    -- Benjamin Franklin
>> "carpe diem quam minimum credula postero"


Re: Automating nodetool repair

Posted by Omid Aladini <om...@gmail.com>.
> > Secondly, what's the need for sleep 120?
>
> just give the cluster a chance to settle down between repairs...
> there's no real need for it, just is there "because".

Actually, repair could cause unreplicated data to be streamed and new
sstables to be created. New sstables could cause pending compactions
and increase the potential number of sstables a row could be spread
across. Therefore you might need more disk seeks to read a row and
have slower read response time. If the read response time is critical,
it's a good idea to wait for pending compactions to settle before
repairing other neighbouring ranges that overlap replicas.

-- Omid

> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"

Re: Automating nodetool repair

Posted by Aaron Turner <sy...@gmail.com>.
On Tue, Aug 28, 2012 at 1:42 PM, Edward Sargisson
<ed...@globalrelay.net> wrote:
> Thanks a very nice approach.
>
> If every nodetool repair uses -pr does that satisfy the requirement to run a
> repair before GCGraceSeconds expires? In otherwords, will we get a correct
> result using -pr everywhere.

Yep.

> Secondly, what's the need for sleep 120?

just give the cluster a chance to settle down between repairs...
there's no real need for it, just is there "because".

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Re: Automating nodetool repair

Posted by Edward Sargisson <ed...@globalrelay.net>.
Thanks a very nice approach.

If every nodetool repair uses -pr does that satisfy the requirement to 
run a repair before GCGraceSeconds expires? In otherwords, will we get a 
correct result using -pr everywhere.

Secondly, what's the need for sleep 120?

Cheers,
Edward

On 12-08-28 07:03 AM, Edward Capriolo wrote:
> You can consider adding -pr. When iterating through all your hosts
> like this. -pr means primary range, and will do less duplicated work.
>
> On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner <sy...@gmail.com> wrote:
>> I use cron.  On one box I just do:
>>
>> for n in node1 node2 node3 node4 ; do
>>     nodetool -h $n repair
>>     sleep 120
>> done
>>
>> A lot easier then managing a bunch of individual crontabs IMHO
>> although I suppose I could of done it with puppet, but then you always
>> have to keep an eye out that your repairs don't overlap over time.
>>
>> On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
>> <ed...@globalrelay.net> wrote:
>>> Hi all,
>>> So nodetool repair has to be run regularly on all nodes. Does anybody have
>>> any interesting strategies or tools for doing this or is everybody just
>>> setting up cron to do it?
>>>
>>> For example, one could write some Puppet code to splay the cron times around
>>> so that only one should be running at once.
>>> Or, perhaps, a central orchestrator that is given some known quiet time and
>>> works its way through the list, running nodetool repair one at a time (using
>>> RPC?) until it runs out of time.
>>>
>>> Cheers,
>>> Edward
>>> --
>>>
>>> Edward Sargisson
>>>
>>> senior java developer
>>> Global Relay
>>>
>>> edward.sargisson@globalrelay.net
>>>
>>>
>>> 866.484.6630
>>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
>>> (+65.3158.1301)
>>>
>>> Global Relay Archive supports email, instant messaging, BlackBerry,
>>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
>>> and more.
>>>
>>>
>>> Ask about Global Relay Message — The Future of Collaboration in the
>>> Financial Services World
>>>
>>>
>>> All email sent to or from this address will be retained by Global Relay’s
>>> email archiving system. This message is intended only for the use of the
>>> individual or entity to which it is addressed, and may contain information
>>> that is privileged, confidential, and exempt from disclosure under
>>> applicable law.  Global Relay will not be liable for any compliance or
>>> technical information provided herein.  All trademarks are the property of
>>> their respective owners.
>>
>>
>> --
>> Aaron Turner
>> http://synfin.net/         Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>>      -- Benjamin Franklin
>> "carpe diem quam minimum credula postero"

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*— *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.


Re: Automating nodetool repair

Posted by Mohit Agarwal <co...@gmail.com>.
Is there any reason why cassandra doesn't do nodetool repair out of the box
at some fixed intervals?

On Tue, Aug 28, 2012 at 9:08 PM, Aaron Turner <sy...@gmail.com> wrote:

> Funny you mention that... i just was hearing on #cassandra this
> morning that it repairs the replica set by default.  I was thinking of
> repairing every 3rd node (RF=3), but running -pr seems "cleaner".
>
> Do you know if this (repairing a replica vs node) was introduced in 1.0 or
> 1.1?
>
> On Tue, Aug 28, 2012 at 7:03 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
> > You can consider adding -pr. When iterating through all your hosts
> > like this. -pr means primary range, and will do less duplicated work.
> >
> > On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner <sy...@gmail.com>
> wrote:
> >> I use cron.  On one box I just do:
> >>
> >> for n in node1 node2 node3 node4 ; do
> >>    nodetool -h $n repair
> >>    sleep 120
> >> done
> >>
> >> A lot easier then managing a bunch of individual crontabs IMHO
> >> although I suppose I could of done it with puppet, but then you always
> >> have to keep an eye out that your repairs don't overlap over time.
> >>
> >> On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
> >> <ed...@globalrelay.net> wrote:
> >>> Hi all,
> >>> So nodetool repair has to be run regularly on all nodes. Does anybody
> have
> >>> any interesting strategies or tools for doing this or is everybody just
> >>> setting up cron to do it?
> >>>
> >>> For example, one could write some Puppet code to splay the cron times
> around
> >>> so that only one should be running at once.
> >>> Or, perhaps, a central orchestrator that is given some known quiet
> time and
> >>> works its way through the list, running nodetool repair one at a time
> (using
> >>> RPC?) until it runs out of time.
> >>>
> >>> Cheers,
> >>> Edward
> >>> --
> >>>
> >>> Edward Sargisson
> >>>
> >>> senior java developer
> >>> Global Relay
> >>>
> >>> edward.sargisson@globalrelay.net
> >>>
> >>>
> >>> 866.484.6630
> >>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |
>  Singapore
> >>> (+65.3158.1301)
> >>>
> >>> Global Relay Archive supports email, instant messaging, BlackBerry,
> >>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter,
> Facebook
> >>> and more.
> >>>
> >>>
> >>> Ask about Global Relay Message — The Future of Collaboration in the
> >>> Financial Services World
> >>>
> >>>
> >>> All email sent to or from this address will be retained by Global
> Relay’s
> >>> email archiving system. This message is intended only for the use of
> the
> >>> individual or entity to which it is addressed, and may contain
> information
> >>> that is privileged, confidential, and exempt from disclosure under
> >>> applicable law.  Global Relay will not be liable for any compliance or
> >>> technical information provided herein.  All trademarks are the
> property of
> >>> their respective owners.
> >>
> >>
> >>
> >> --
> >> Aaron Turner
> >> http://synfin.net/         Twitter: @synfinatic
> >> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix
> & Windows
> >> Those who would give up essential Liberty, to purchase a little
> temporary
> >> Safety, deserve neither Liberty nor Safety.
> >>     -- Benjamin Franklin
> >> "carpe diem quam minimum credula postero"
>
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

Re: Automating nodetool repair

Posted by Aaron Turner <sy...@gmail.com>.
Funny you mention that... i just was hearing on #cassandra this
morning that it repairs the replica set by default.  I was thinking of
repairing every 3rd node (RF=3), but running -pr seems "cleaner".

Do you know if this (repairing a replica vs node) was introduced in 1.0 or 1.1?

On Tue, Aug 28, 2012 at 7:03 AM, Edward Capriolo <ed...@gmail.com> wrote:
> You can consider adding -pr. When iterating through all your hosts
> like this. -pr means primary range, and will do less duplicated work.
>
> On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner <sy...@gmail.com> wrote:
>> I use cron.  On one box I just do:
>>
>> for n in node1 node2 node3 node4 ; do
>>    nodetool -h $n repair
>>    sleep 120
>> done
>>
>> A lot easier then managing a bunch of individual crontabs IMHO
>> although I suppose I could of done it with puppet, but then you always
>> have to keep an eye out that your repairs don't overlap over time.
>>
>> On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
>> <ed...@globalrelay.net> wrote:
>>> Hi all,
>>> So nodetool repair has to be run regularly on all nodes. Does anybody have
>>> any interesting strategies or tools for doing this or is everybody just
>>> setting up cron to do it?
>>>
>>> For example, one could write some Puppet code to splay the cron times around
>>> so that only one should be running at once.
>>> Or, perhaps, a central orchestrator that is given some known quiet time and
>>> works its way through the list, running nodetool repair one at a time (using
>>> RPC?) until it runs out of time.
>>>
>>> Cheers,
>>> Edward
>>> --
>>>
>>> Edward Sargisson
>>>
>>> senior java developer
>>> Global Relay
>>>
>>> edward.sargisson@globalrelay.net
>>>
>>>
>>> 866.484.6630
>>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
>>> (+65.3158.1301)
>>>
>>> Global Relay Archive supports email, instant messaging, BlackBerry,
>>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
>>> and more.
>>>
>>>
>>> Ask about Global Relay Message — The Future of Collaboration in the
>>> Financial Services World
>>>
>>>
>>> All email sent to or from this address will be retained by Global Relay’s
>>> email archiving system. This message is intended only for the use of the
>>> individual or entity to which it is addressed, and may contain information
>>> that is privileged, confidential, and exempt from disclosure under
>>> applicable law.  Global Relay will not be liable for any compliance or
>>> technical information provided herein.  All trademarks are the property of
>>> their respective owners.
>>
>>
>>
>> --
>> Aaron Turner
>> http://synfin.net/         Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>>     -- Benjamin Franklin
>> "carpe diem quam minimum credula postero"



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Re: Automating nodetool repair

Posted by Edward Capriolo <ed...@gmail.com>.
You can consider adding -pr. When iterating through all your hosts
like this. -pr means primary range, and will do less duplicated work.

On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner <sy...@gmail.com> wrote:
> I use cron.  On one box I just do:
>
> for n in node1 node2 node3 node4 ; do
>    nodetool -h $n repair
>    sleep 120
> done
>
> A lot easier then managing a bunch of individual crontabs IMHO
> although I suppose I could of done it with puppet, but then you always
> have to keep an eye out that your repairs don't overlap over time.
>
> On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
> <ed...@globalrelay.net> wrote:
>> Hi all,
>> So nodetool repair has to be run regularly on all nodes. Does anybody have
>> any interesting strategies or tools for doing this or is everybody just
>> setting up cron to do it?
>>
>> For example, one could write some Puppet code to splay the cron times around
>> so that only one should be running at once.
>> Or, perhaps, a central orchestrator that is given some known quiet time and
>> works its way through the list, running nodetool repair one at a time (using
>> RPC?) until it runs out of time.
>>
>> Cheers,
>> Edward
>> --
>>
>> Edward Sargisson
>>
>> senior java developer
>> Global Relay
>>
>> edward.sargisson@globalrelay.net
>>
>>
>> 866.484.6630
>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
>> (+65.3158.1301)
>>
>> Global Relay Archive supports email, instant messaging, BlackBerry,
>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
>> and more.
>>
>>
>> Ask about Global Relay Message — The Future of Collaboration in the
>> Financial Services World
>>
>>
>> All email sent to or from this address will be retained by Global Relay’s
>> email archiving system. This message is intended only for the use of the
>> individual or entity to which it is addressed, and may contain information
>> that is privileged, confidential, and exempt from disclosure under
>> applicable law.  Global Relay will not be liable for any compliance or
>> technical information provided herein.  All trademarks are the property of
>> their respective owners.
>
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"

Re: Automating nodetool repair

Posted by Aaron Turner <sy...@gmail.com>.
I use cron.  On one box I just do:

for n in node1 node2 node3 node4 ; do
   nodetool -h $n repair
   sleep 120
done

A lot easier then managing a bunch of individual crontabs IMHO
although I suppose I could of done it with puppet, but then you always
have to keep an eye out that your repairs don't overlap over time.

On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
<ed...@globalrelay.net> wrote:
> Hi all,
> So nodetool repair has to be run regularly on all nodes. Does anybody have
> any interesting strategies or tools for doing this or is everybody just
> setting up cron to do it?
>
> For example, one could write some Puppet code to splay the cron times around
> so that only one should be running at once.
> Or, perhaps, a central orchestrator that is given some known quiet time and
> works its way through the list, running nodetool repair one at a time (using
> RPC?) until it runs out of time.
>
> Cheers,
> Edward
> --
>
> Edward Sargisson
>
> senior java developer
> Global Relay
>
> edward.sargisson@globalrelay.net
>
>
> 866.484.6630
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
> (+65.3158.1301)
>
> Global Relay Archive supports email, instant messaging, BlackBerry,
> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
> and more.
>
>
> Ask about Global Relay Message — The Future of Collaboration in the
> Financial Services World
>
>
> All email sent to or from this address will be retained by Global Relay’s
> email archiving system. This message is intended only for the use of the
> individual or entity to which it is addressed, and may contain information
> that is privileged, confidential, and exempt from disclosure under
> applicable law.  Global Relay will not be liable for any compliance or
> technical information provided herein.  All trademarks are the property of
> their respective owners.



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"