You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Oleg Dulin <ol...@gmail.com> on 2012/10/09 21:56:24 UTC

1.1.1 is "repair" still needed ?

My understanding is that the repair has to happen within gc_grace period.

But in 1.1.1 you can set gc_grace by CF. A couple of my CFs that are 
frequently updated have gc_grace of 1 hour, but we do run a weekly 
repair.

So the question is, is this still needed ? Do we even need to run 
nodetool repair ?

If gc_grace is 10 days on all other CFs, are we saying that as long as 
we restart that node within the 10 day period we don't need to run 
nodetool repair ?

The reason I bring this up is because repair once in a while runs for 
more than a day on some of these nodes (500+Gigs of data) and it is 
causing slowness with read requests.


-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: 1.1.1 is "repair" still needed ?

Posted by Watanabe Maki <wa...@gmail.com>.

Oh sorry. It's pretty nice to know that.


On 2012/10/12, at 0:18, "B. Todd Burruss" <bt...@gmail.com> wrote:

> as of 1.0 (CASSANDRA-2034) hints are generated for nodes that timeout.
> 
> On Thu, Oct 11, 2012 at 3:55 AM, Watanabe Maki <wa...@gmail.com> wrote:
>> Even if HH works fine, HH will not be created until the failure detector marks  the node is dead.
>> HH will not be created for partially timeouted mutation request ( but meets CL ) also... In my understanding...
>> 
>> 
>> On 2012/10/11, at 5:55, Rob Coli <rc...@palominodb.com> wrote:
>> 
>>> On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin <ol...@gmail.com> wrote:
>>>> My understanding is that the repair has to happen within gc_grace period.
>>>> [ snip ]
>>>> So the question is, is this still needed ? Do we even need to run nodetool
>>>> repair ?
>>> 
>>> If Hinted Handoff works in your version of Cassandra, and that version
>>> is > 1.0, you "should" not need to repair if no node has crashed or
>>> been down for longer than max_hint_window_in_ms. This is because after
>>> 1.0, any failed write to a remote replica results in a hint, so any
>>> DELETE should eventually be fully replicated.
>>> 
>>> However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6
>>> (unreleased) so you cannot rely on the above heuristic for
>>> consistency. In these versions, you have to repair (or read repair
>>> 100% of keys) once every GCGraceSeconds to prevent the possibility of
>>> zombie data. If it were possible to repair on a per-columnfamily
>>> basis, you could get a significant win by only repairing
>>> columnfamilies which take DELETE traffic.
>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-4772
>>> 
>>> =Rob
>>> 
>>> --
>>> =Robert Coli
>>> AIM&GTALK - rcoli@palominodb.com
>>> YAHOO - rcoli.palominob
>>> SKYPE - rcoli_palominodb

Re: 1.1.1 is "repair" still needed ?

Posted by "B. Todd Burruss" <bt...@gmail.com>.

as of 1.0 (CASSANDRA-2034) hints are generated for nodes that timeout.

On Thu, Oct 11, 2012 at 3:55 AM, Watanabe Maki <wa...@gmail.com> wrote:
> Even if HH works fine, HH will not be created until the failure detector marks  the node is dead.
> HH will not be created for partially timeouted mutation request ( but meets CL ) also... In my understanding...
>
>
> On 2012/10/11, at 5:55, Rob Coli <rc...@palominodb.com> wrote:
>
>> On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin <ol...@gmail.com> wrote:
>>> My understanding is that the repair has to happen within gc_grace period.
>>> [ snip ]
>>> So the question is, is this still needed ? Do we even need to run nodetool
>>> repair ?
>>
>> If Hinted Handoff works in your version of Cassandra, and that version
>> is > 1.0, you "should" not need to repair if no node has crashed or
>> been down for longer than max_hint_window_in_ms. This is because after
>> 1.0, any failed write to a remote replica results in a hint, so any
>> DELETE should eventually be fully replicated.
>>
>> However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6
>> (unreleased) so you cannot rely on the above heuristic for
>> consistency. In these versions, you have to repair (or read repair
>> 100% of keys) once every GCGraceSeconds to prevent the possibility of
>> zombie data. If it were possible to repair on a per-columnfamily
>> basis, you could get a significant win by only repairing
>> columnfamilies which take DELETE traffic.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-4772
>>
>> =Rob
>>
>> --
>> =Robert Coli
>> AIM&GTALK - rcoli@palominodb.com
>> YAHOO - rcoli.palominob
>> SKYPE - rcoli_palominodb

Re: 1.1.1 is "repair" still needed ?

Posted by Watanabe Maki <wa...@gmail.com>.

Even if HH works fine, HH will not be created until the failure detector marks  the node is dead. 
HH will not be created for partially timeouted mutation request ( but meets CL ) also... In my understanding...


On 2012/10/11, at 5:55, Rob Coli <rc...@palominodb.com> wrote:

> On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin <ol...@gmail.com> wrote:
>> My understanding is that the repair has to happen within gc_grace period.
>> [ snip ]
>> So the question is, is this still needed ? Do we even need to run nodetool
>> repair ?
> 
> If Hinted Handoff works in your version of Cassandra, and that version
> is > 1.0, you "should" not need to repair if no node has crashed or
> been down for longer than max_hint_window_in_ms. This is because after
> 1.0, any failed write to a remote replica results in a hint, so any
> DELETE should eventually be fully replicated.
> 
> However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6
> (unreleased) so you cannot rely on the above heuristic for
> consistency. In these versions, you have to repair (or read repair
> 100% of keys) once every GCGraceSeconds to prevent the possibility of
> zombie data. If it were possible to repair on a per-columnfamily
> basis, you could get a significant win by only repairing
> columnfamilies which take DELETE traffic.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-4772
> 
> =Rob
> 
> -- 
> =Robert Coli
> AIM&GTALK - rcoli@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb

Re: 1.1.1 is "repair" still needed ?

Posted by Rob Coli <rc...@palominodb.com>.

On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin <ol...@gmail.com> wrote:
> My understanding is that the repair has to happen within gc_grace period.
> [ snip ]
> So the question is, is this still needed ? Do we even need to run nodetool
> repair ?

If Hinted Handoff works in your version of Cassandra, and that version
is > 1.0, you "should" not need to repair if no node has crashed or
been down for longer than max_hint_window_in_ms. This is because after
1.0, any failed write to a remote replica results in a hint, so any
DELETE should eventually be fully replicated.

However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6
(unreleased) so you cannot rely on the above heuristic for
consistency. In these versions, you have to repair (or read repair
100% of keys) once every GCGraceSeconds to prevent the possibility of
zombie data. If it were possible to repair on a per-columnfamily
basis, you could get a significant win by only repairing
columnfamilies which take DELETE traffic.

https://issues.apache.org/jira/browse/CASSANDRA-4772

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Re: 1.1.1 is "repair" still needed ?

Posted by Aaron Turner <sy...@gmail.com>.

On Tue, Oct 9, 2012 at 8:56 PM, Oleg Dulin <ol...@gmail.com> wrote:
> My understanding is that the repair has to happen within gc_grace period.
>
> But in 1.1.1 you can set gc_grace by CF. A couple of my CFs that are
> frequently updated have gc_grace of 1 hour, but we do run a weekly repair.
>
> So the question is, is this still needed ? Do we even need to run nodetool
> repair ?
>
> If gc_grace is 10 days on all other CFs, are we saying that as long as we
> restart that node within the 10 day period we don't need to run nodetool
> repair ?
>
> The reason I bring this up is because repair once in a while runs for more
> than a day on some of these nodes (500+Gigs of data) and it is causing
> slowness with read requests.

My understanding is:

As long as all your nodes are in sync, then the repair isn't needed.
But if you have a tombstone which isn't replicated to all the nodes
for whatever reason, then the data can come back.  Repair just
guarantees that all the nodes that should of gotten the tombstones got
them.


-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"