You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Carl Lerche <me...@carllerche.com> on 2013/08/01 18:35:31 UTC

How often to run `nodetool repair`

Hello,

I read in the docs that `nodetool repair` should be regularly run unless no
delete is ever performed. In my app, I never delete, but I heavily use the
ttl feature. Should repair still be run regularly? Also, does repair take
less time if it is run regularly? If not, is there a way to incrementally
run it? It seems that when I do run repair, it takes a long time and causes
high amounts CPU usage and iowait.

Thoughts?

Thanks,
Carl

Re: How often to run `nodetool repair`

Posted by rash aroskar <ra...@gmail.com>.
We observed the same behavior. During last repair the data distribution on
nodes was imbalanced as well resulting in one node bloating.
On Aug 1, 2013 12:36 PM, "Carl Lerche" <me...@carllerche.com> wrote:

> Hello,
>
> I read in the docs that `nodetool repair` should be regularly run unless
> no delete is ever performed. In my app, I never delete, but I heavily use
> the ttl feature. Should repair still be run regularly? Also, does repair
> take less time if it is run regularly? If not, is there a way to
> incrementally run it? It seems that when I do run repair, it takes a long
> time and causes high amounts CPU usage and iowait.
>
> Thoughts?
>
> Thanks,
> Carl
>

Re: How often to run `nodetool repair`

Posted by horschi <ho...@gmail.com>.
> TTL is effectively DELETE; you need to run a repair once every
> gc_grace_seconds. If you don't, data might un-delete itself.
>

The undelete part is not true. btw: With CASSANDRA-4917 TTLed columns will
not even create a tombstone (assuming ttl > gc_grace).

The rest of your mail I agree with :-)

Re: How often to run `nodetool repair`

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Aug 1, 2013 at 1:16 PM, Andrey Ilinykh <ai...@gmail.com> wrote:

>
> On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> TTL is effectively DELETE; you need to run a repair once every
>> gc_grace_seconds. If you don't, data might un-delete itself.
>>
>
> How is it possible? Every replica has TTL, so it when it expires every
> replica has tombstone. I don't see how you can get data with no tombstone.
> What do I miss?
>

I knew I had heard of cases where repair is required despite TTL, but
didn't recall the specifics. Thanks for the opportunity to go look it up...

http://comments.gmane.org/gmane.comp.db.cassandra.user/21008

quoting Sylvain Lebresne :
"
The initial question was about "can I use inserting with ttl=1 instead of
issuing deletes", ***so that would be a case where you do shadow a previous
version with a very small ttl and so repair is important.*** (EMPHASIS
rcoli)

But you're right that if you only issue data with expiration (no deletes)
and
that you
  * either do not overwrite columns
  * or are sure that when you do overwrite, the value you're overwriting has
     a ttl that is lesser or equal than the ttl of the value you're
overwriting with
     (+gc_grace to be precise)
then yes, ***repair is not necessary because you can't have shadowed value
resurfacing.*** (EMPHASIS rcoli)
"

So, to be more precise with my initial statement :

"TTL is like DELETE in some cases, so unless you are certain that you are
not (and will not be) in those cases, you should run repair when using TTL."

Also you will be unable to repair entire keyspaces, you will have to repair
on a per column family basis, manually excluding CFs matching these
criteria, increasing management complexity.

=Rob

Re: How often to run `nodetool repair`

Posted by Erik Forkalsud <ef...@cj.com>.
On 08/01/2013 01:16 PM, Andrey Ilinykh wrote:
>
>     TTL is effectively DELETE; you need to run a repair once every
>     gc_grace_seconds. If you don't, data might un-delete itself.
>
>
> How is it possible? Every replica has TTL, so it when it expires every 
> replica has tombstone. I don't see how you can get data with no 
> tombstone. What do I miss?
>

The only way I can think of is this scenario:

    - value "A" for some key is written with ttl=30days, to all 
replicas   (i.e a long ttl or no ttl at all)
    - value "B" for the same key is written with ttl=1day, but doesn't 
reach all replicas
    - one day passes and the ttl=1day values turn into deletes
    - gc_grace passes and the tombstones are purged

at this point, the replica that didn't get the ttl=1day value will think 
the older value "A" is live.

I'm no expert on this so I may be mistaken, but in any case it's a 
corner case as overwriting columns with shorter ttls would be unusual.


- Erik -


Re: How often to run `nodetool repair`

Posted by Andrey Ilinykh <ai...@gmail.com>.
On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche <me...@carllerche.com> wrote:
>
>> I read in the docs that `nodetool repair` should be regularly run unless
>> no delete is ever performed. In my app, I never delete, but I heavily use
>> the ttl feature. Should repair still be run regularly? Also, does repair
>> take less time if it is run regularly? If not, is there a way to
>> incrementally run it? It seems that when I do run repair, it takes a long
>> time and causes high amounts CPU usage and iowait.
>>
>
> TTL is effectively DELETE; you need to run a repair once every
> gc_grace_seconds. If you don't, data might un-delete itself.
>

How is it possible? Every replica has TTL, so it when it expires every
replica has tombstone. I don't see how you can get data with no tombstone.
What do I miss?

Andrey

Re: How often to run `nodetool repair`

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche <me...@carllerche.com> wrote:

> I read in the docs that `nodetool repair` should be regularly run unless
> no delete is ever performed. In my app, I never delete, but I heavily use
> the ttl feature. Should repair still be run regularly? Also, does repair
> take less time if it is run regularly? If not, is there a way to
> incrementally run it? It seems that when I do run repair, it takes a long
> time and causes high amounts CPU usage and iowait.
>

TTL is effectively DELETE; you need to run a repair once every
gc_grace_seconds. If you don't, data might un-delete itself. Even if you
don't care about data un-deleting itself, you still need to run repair
occasionally to ensure overall consistency. Hinted handoff and read repair
are only an optimization and do not have an official responsibility for
providing consistency.

If you struggle with the overhead of repair, one way to reduce the pain is
to increase gc_grace_seconds. The default of 10 days is arbitrary and IMO
too low, something more like 30 days will reduce the fixed very-high cost
of repair, at the cost of keeping tombstones around for 3x as long.

If you are running a version below 1.2.6, especially below 1.2.0, the
combination of TTL with repair can lead to insane over-repair.

https://issues.apache.org/jira/browse/CASSANDRA-4905
https://issues.apache.org/jira/browse/CASSANDRA-5398

There is a mechanism for incremental (manually managed..) repair.

https://issues.apache.org/jira/browse/CASSANDRA-3912

=Rob

Re: How often to run `nodetool repair`

Posted by Arthur Zubarev <Ar...@Aol.com>.
Cassandra is an excellent choice for write heavy applications.

Reading large sets of data is not as fast and not as easy, you may need to have your client paging thru it and you may need slice queries and proper PK+Indexes to think of in advance.

Regards,

Arthur

From: Carl Lerche 
Sent: Thursday, August 01, 2013 3:03 PM
To: user@cassandra.apache.org ; Arthur Zubarev 
Subject: Re: How often to run `nodetool repair`

Arthur, 

Yes, my use case for this Cassandra cluster is analytics. I am building a google dapper (application tracing) like system. I collect application traces and write them to Cassandra. Then, I have periodic rollup tasks that read the data, do some summarization and write it back.

Thoughts on how to manage a write heavy cluster?

Thanks,
Carl



On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev <Ar...@aol.com> wrote:

  Hi Carl,

  The ‘repair’ is for data reads. Compaction will take care of the expired data.

  The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather.

  Regards,

  Arthur

  From: Carl Lerche 
  Sent: Thursday, August 01, 2013 12:35 PM
  To: user@cassandra.apache.org 
  Subject: How often to run `nodetool repair`

  Hello, 

  I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait.

  Thoughts?

  Thanks,
  Carl

Re: How often to run `nodetool repair`

Posted by Carl Lerche <me...@carllerche.com>.
Arthur,

Yes, my use case for this Cassandra cluster is analytics. I am building a
google dapper (application tracing) like system. I collect application
traces and write them to Cassandra. Then, I have periodic rollup tasks that
read the data, do some summarization and write it back.

Thoughts on how to manage a write heavy cluster?

Thanks,
Carl


On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev <Ar...@aol.com>wrote:

>   Hi Carl,
>
> The ‘repair’ is for data reads. Compaction will take care of the expired
> data.
>
> The fact a repair runs long makes me think the nodes receive unbalanced
> amounts of writes rather.
>
> Regards,
>
> Arthur
>
>  *From:* Carl Lerche <me...@carllerche.com>
> *Sent:* Thursday, August 01, 2013 12:35 PM
> *To:* user@cassandra.apache.org
> *Subject:* How often to run `nodetool repair`
>
>  Hello,
>
> I read in the docs that `nodetool repair` should be regularly run unless
> no delete is ever performed. In my app, I never delete, but I heavily use
> the ttl feature. Should repair still be run regularly? Also, does repair
> take less time if it is run regularly? If not, is there a way to
> incrementally run it? It seems that when I do run repair, it takes a long
> time and causes high amounts CPU usage and iowait.
>
> Thoughts?
>
> Thanks,
> Carl
>

Re: How often to run `nodetool repair`

Posted by Arthur Zubarev <Ar...@Aol.com>.
Hi Carl,

The ‘repair’ is for data reads. Compaction will take care of the expired data.

The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather.

Regards,

Arthur

From: Carl Lerche 
Sent: Thursday, August 01, 2013 12:35 PM
To: user@cassandra.apache.org 
Subject: How often to run `nodetool repair`

Hello, 

I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait.

Thoughts?

Thanks,
Carl