You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edward Capriolo <ed...@gmail.com> on 2013/06/10 23:53:10 UTC

Re: [Cassandra] Expanding a Cassandra cluster

You eventually should run cleanup to remove data no longer needed on the
node. However it does not need to be run quickly after a join. You can run
it when you get around to it. I would run it on a few nodes at a time until
they are all cleaned up.


On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan <
svemalayan@yahoo.com> wrote:

> Hi All,
>
> Datastax manual suggests that during a Cassandra cluster expansion, an
> administrator has to run nodetool cleanup on each of the previously
> existing Cassandra nodes to remove the keys that are no longer belonging to
> those nodes. Further the manual says that the nodetool cleanup  task
> should be run sequentially on the existing Cassandra nodes.
>
> Reference:
> http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity
>
> Here is my problem: I have a very large Cassandra cluster with 100s of
> nodes and running nodetool cleanup sequentially will take a long time to
> finish.
>
>  Questions: a) So can someone tell me  about the implications of running
> the nodetool cleanup concurrently on the entire cluster ?
>                    b) Will Cassandra automatically take care of removing
> obsolete keys in future ?
>
>
> Thank you
> Emalayan
>

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
We run it concurrently each RF nodes (If RF = 3, we run it on 3 waves). If
the node is busy cleaning up, then the client will time out and ask to an
other node having a copy of the data and that is not being cleaned up.

"Will node tool cleanup consume lot of IO and CPU even though there is
nothing to clean"

Yes, I think so, since you have to check that you have nothing to clean...
I think there is no case that need a regular cleanup anyway.


2013/6/12 Michal Michalski <mi...@opera.com>

>  What will happen if I add nodetool cleanup to run periodically (similar
>> to nodetool repair) ? Will node tool cleanup consume lot of IO and CPU even
>> though there is nothing to clean ?
>>
>
> Why would you need doing so?
>
> M.
>
>
>
>> Thank you
>> Emalayan
>>
>>
>> ______________________________**__
>>   From: Robert Coli <rc...@eventbrite.com>
>> To: user@cassandra.apache.org; Emalayan Vairavanathan <
>> svemalayan@yahoo.com>
>> Sent: Monday, 10 June 2013 5:15 PM
>> Subject: Re: [Cassandra] Expanding a Cassandra cluster
>>
>>
>> On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
>> <sv...@yahoo.com> wrote:
>>
>>> I suspect that nodetool cleanup is IO intensive. So running nodetool
>>> cleanup
>>> concurrently on the entire cluster may have a significantly impact the IO
>>> performance of applications.
>>>
>>
>> cleanup is a specific kind of compaction, and as such respects the
>> compaction throughput throttle.
>>
>> The compaction throughput throttle is designed to prevent compaction
>> from negatively impacting the performance of things-not-compaction. If
>> you notice that cleanup compaction on all or most nodes consumes too
>> much i/o, reduce the throttle value.
>>
>> =Rob
>>
>>
>

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Michal Michalski <mi...@opera.com>.
> What will happen if I add nodetool cleanup to run periodically (similar to nodetool repair) ? Will node tool cleanup consume lot of IO and CPU even though there is nothing to clean ?

Why would you need doing so?

M.

>
> Thank you
> Emalayan
>
>
> ________________________________
>   From: Robert Coli <rc...@eventbrite.com>
> To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com>
> Sent: Monday, 10 June 2013 5:15 PM
> Subject: Re: [Cassandra] Expanding a Cassandra cluster
>
>
> On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
> <sv...@yahoo.com> wrote:
>> I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup
>> concurrently on the entire cluster may have a significantly impact the IO
>> performance of applications.
>
> cleanup is a specific kind of compaction, and as such respects the
> compaction throughput throttle.
>
> The compaction throughput throttle is designed to prevent compaction
> from negatively impacting the performance of things-not-compaction. If
> you notice that cleanup compaction on all or most nodes consumes too
> much i/o, reduce the throttle value.
>
> =Rob
>


Re: [Cassandra] Expanding a Cassandra cluster

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Thank you Robert and all others who replied to my question.

What will happen if I add nodetool cleanup to run periodically (similar to nodetool repair) ? Will node tool cleanup consume lot of IO and CPU even though there is nothing to clean ?

Thank you
Emalayan 


________________________________
 From: Robert Coli <rc...@eventbrite.com>
To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com> 
Sent: Monday, 10 June 2013 5:15 PM
Subject: Re: [Cassandra] Expanding a Cassandra cluster
 

On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
<sv...@yahoo.com> wrote:
> I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup
> concurrently on the entire cluster may have a significantly impact the IO
> performance of applications.

cleanup is a specific kind of compaction, and as such respects the
compaction throughput throttle.

The compaction throughput throttle is designed to prevent compaction
from negatively impacting the performance of things-not-compaction. If
you notice that cleanup compaction on all or most nodes consumes too
much i/o, reduce the throttle value.

=Rob

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
<sv...@yahoo.com> wrote:
> I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup
> concurrently on the entire cluster may have a significantly impact the IO
> performance of applications.

cleanup is a specific kind of compaction, and as such respects the
compaction throughput throttle.

The compaction throughput throttle is designed to prevent compaction
from negatively impacting the performance of things-not-compaction. If
you notice that cleanup compaction on all or most nodes consumes too
much i/o, reduce the throttle value.

=Rob

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Thank you Edward.

I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup concurrently on the entire cluster may have a significantly impact the IO  performance of applications.

Apart from this, do you see any other implications on running the nodetool cleanup concurrently on the entire cluster ?

Thank you
Emalayan


________________________________
 From: Edward Capriolo <ed...@gmail.com>
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>; Emalayan Vairavanathan <sv...@yahoo.com> 
Sent: Monday, 10 June 2013 2:53 PM
Subject: Re: [Cassandra] Expanding a Cassandra cluster
 


You eventually should run cleanup to remove data no longer needed on the node. However it does not need to be run quickly after a join. You can run it when you get around to it. I would run it on a few nodes at a time until they are all cleaned up.




On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan <sv...@yahoo.com> wrote:

Hi All,
>
>
>Datastax manual suggests that during a Cassandra cluster expansion, an administrator has to run nodetool cleanup on each of the previously existing Cassandra nodes to remove the keys that are no longer belonging to those nodes. Further the manual says that thenodetool cleanup  task should be run sequentially on the existing Cassandra nodes.
>
>
>Reference: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity
>
>
>Here is my problem: I have a very large Cassandra cluster with 100s of nodes and running nodetool cleanup sequentially will take a long time to finish. 
>
>
> Questions: a) So can someone tell me  about the implications of running the nodetool cleanup concurrently on the entire cluster ?
>                   b) Will Cassandra automatically take care of removing obsolete keys in future ?
>
>
>
>
>Thank youEmalayan