You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jack Krupansky <ja...@gmail.com> on 2016/03/29 05:04:53 UTC

Acceptable repair time

Someone recently asked me for advice when their repair time was 2-3 days. I
thought that was outrageous, but not unheard of. Personally, to me, 2-3
hours would be about the limit of what I could tolerate, and my personal
goal would be that a full repair of a node should take no longer than an
hour, maybe 90 minutes tops. But... achieving those more abbreviated repair
times would strongly suggest that the amount of data on each node be kept
down to a tiny fraction of a typical spinning disk drive, or even a
fraction of a larger SSD drive.

So, my question here is what people consider acceptable full repair times
for nodes and what the resulting node data size is.

What impact vnodes has on these numbers is a bonus question.

Thanks!

-- Jack Krupansky

Re: Acceptable repair time

Posted by Anishek Agarwal <an...@gmail.com>.
we have about 380GB / RF = 3 ~ 1200 GB on disk. since we are on 2.0.17
there is no incremental repair :(

On Tue, Mar 29, 2016 at 6:05 PM, Kai Wang <de...@gmail.com> wrote:

> IIRC when we switched to LCS and ran the first full repair with
> 250GB/RF=3, it took at least 12 hours for the repair to finish, then
> another 3+ days for all the compaction to catch up. I called it "the big
> bang of LCS".
>
> Since then we've been running nightly incremental repair.
>
> For me as long as it's reliable (no streaming error, better progress
> reporting etc), I actually don't mind it it takes more than a few hours to
> do a full repair. But I am not sure about 4 days... I guess it depends on
> the size of the cluster and data...
>
> On Tue, Mar 29, 2016 at 6:04 AM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> I would really like to know the answer for above because on some nodes
>> repair takes almost 4 days for us :(.
>>
>> On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky <jack.krupansky@gmail.com
>> > wrote:
>>
>>> Someone recently asked me for advice when their repair time was 2-3
>>> days. I thought that was outrageous, but not unheard of. Personally, to me,
>>> 2-3 hours would be about the limit of what I could tolerate, and my
>>> personal goal would be that a full repair of a node should take no longer
>>> than an hour, maybe 90 minutes tops. But... achieving those more
>>> abbreviated repair times would strongly suggest that the amount of data on
>>> each node be kept down to a tiny fraction of a typical spinning disk drive,
>>> or even a fraction of a larger SSD drive.
>>>
>>> So, my question here is what people consider acceptable full repair
>>> times for nodes and what the resulting node data size is.
>>>
>>> What impact vnodes has on these numbers is a bonus question.
>>>
>>> Thanks!
>>>
>>> -- Jack Krupansky
>>>
>>
>>
>

Re: Acceptable repair time

Posted by Kai Wang <de...@gmail.com>.
IIRC when we switched to LCS and ran the first full repair with 250GB/RF=3,
it took at least 12 hours for the repair to finish, then another 3+ days
for all the compaction to catch up. I called it "the big bang of LCS".

Since then we've been running nightly incremental repair.

For me as long as it's reliable (no streaming error, better progress
reporting etc), I actually don't mind it it takes more than a few hours to
do a full repair. But I am not sure about 4 days... I guess it depends on
the size of the cluster and data...

On Tue, Mar 29, 2016 at 6:04 AM, Anishek Agarwal <an...@gmail.com> wrote:

> I would really like to know the answer for above because on some nodes
> repair takes almost 4 days for us :(.
>
> On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> Someone recently asked me for advice when their repair time was 2-3 days.
>> I thought that was outrageous, but not unheard of. Personally, to me, 2-3
>> hours would be about the limit of what I could tolerate, and my personal
>> goal would be that a full repair of a node should take no longer than an
>> hour, maybe 90 minutes tops. But... achieving those more abbreviated repair
>> times would strongly suggest that the amount of data on each node be kept
>> down to a tiny fraction of a typical spinning disk drive, or even a
>> fraction of a larger SSD drive.
>>
>> So, my question here is what people consider acceptable full repair times
>> for nodes and what the resulting node data size is.
>>
>> What impact vnodes has on these numbers is a bonus question.
>>
>> Thanks!
>>
>> -- Jack Krupansky
>>
>
>

Re: Acceptable repair time

Posted by Anishek Agarwal <an...@gmail.com>.
I would really like to know the answer for above because on some nodes
repair takes almost 4 days for us :(.

On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> Someone recently asked me for advice when their repair time was 2-3 days.
> I thought that was outrageous, but not unheard of. Personally, to me, 2-3
> hours would be about the limit of what I could tolerate, and my personal
> goal would be that a full repair of a node should take no longer than an
> hour, maybe 90 minutes tops. But... achieving those more abbreviated repair
> times would strongly suggest that the amount of data on each node be kept
> down to a tiny fraction of a typical spinning disk drive, or even a
> fraction of a larger SSD drive.
>
> So, my question here is what people consider acceptable full repair times
> for nodes and what the resulting node data size is.
>
> What impact vnodes has on these numbers is a bonus question.
>
> Thanks!
>
> -- Jack Krupansky
>