You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Emalayan Vairavanathan <sv...@yahoo.com> on 2013/06/10 23:00:00 UTC

[Cassandra] Expanding a Cassandra cluster

Hi All,

Datastax manual suggests that during a Cassandra cluster expansion, an administrator has to run nodetool cleanup on each of the previously existing Cassandra nodes to remove the keys that are no longer belonging to those nodes. Further the manual says that thenodetool cleanup  task should be run sequentially on the existing Cassandra nodes.

Reference: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity

Here is my problem: I have a very large Cassandra cluster with 100s of nodes and running nodetool cleanup sequentially will take a long time to finish. 

 Questions: a) So can someone tell me  about the implications of running the nodetool cleanup concurrently on the entire cluster ?
                   b) Will Cassandra automatically take care of removing obsolete keys in future ?


Thank you
Emalayan 

Re: [Cassandra] Running node tool cleanup

Posted by aaron morton <aa...@thelastpickle.com>.
> Does nodetool cleanup run synchronously or asynchronously ?
Async on the server. 

> If it is running asynchronously is there any way to monitor the progress ?
Nodetool compactionstats

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/06/2013, at 7:43 AM, Emalayan Vairavanathan <sv...@yahoo.com> wrote:

> Thank you Robert and others for answering my questions.
> 
> I started to play with nodetool  and I have few more questions.
> 
> Does nodetool cleanup run synchronously or asynchronously ?
> 
> If it is running asynchronously is there any way to monitor the progress ?
> 
> Thank you
> Emalayan
> 
> From: Robert Coli <rc...@eventbrite.com>
> To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com> 
> Sent: Thursday, 20 June 2013 10:03 AM
> Subject: Re: [Cassandra] Running node tool cleanup
> 
> On Thu, Jun 20, 2013 at 12:01 AM, Emalayan Vairavanathan
> <sv...@yahoo.com> wrote:
> > 1) What will happen if I run nodetool cleanup immediately after bringing a
> > new node up (i.e. before the key migration process is completed) ?
> >        Will it cause some race conditions ? Or will it result in some part
> > of the space never be reclaimed ?
> 
> As I understand it, the new node isn't responsible for the range until
> the migration process is complete, so I presume cleanup will do
> nothing in this case. This is so the old node can continue to serve
> the range during the bootstrap, and in case of bootstrap failure.
> 
> > 2) After adding a new machine, how can I make sure that the key migration is
> > completed ? Should I run nodetool netstats on all the nodes ? Is there any
> > better way ?
> 
> nodetool ring/netstats and/or grepping the log for the
> completed-bootstrap message.
> 
> =Rob
> 
> 


Re: [Cassandra] Running node tool cleanup

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Thank you Robert and others for answering my questions.

I started to play with nodetool  and I have few more questions.

Does nodetool cleanup run synchronously or asynchronously ?

If it is running asynchronously is there any way to monitor the progress ?

Thank you
Emalayan


________________________________
 From: Robert Coli <rc...@eventbrite.com>
To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com> 
Sent: Thursday, 20 June 2013 10:03 AM
Subject: Re: [Cassandra] Running node tool cleanup
 

On Thu, Jun 20, 2013 at 12:01 AM, Emalayan Vairavanathan
<sv...@yahoo.com> wrote:
> 1) What will happen if I run nodetool cleanup immediately after bringing a
> new node up (i.e. before the key migration process is completed) ?
>         Will it cause some race conditions ? Or will it result in some part
> of the space never be reclaimed ?

As I understand it, the new node isn't responsible for the range until
the migration process is complete, so I presume cleanup will do
nothing in this case. This is so the old node can continue to serve
the range during the bootstrap, and in case of bootstrap failure.

> 2) After adding a new machine, how can I make sure that the key migration is
> completed ? Should I run nodetool netstats on all the nodes ? Is there any
> better way ?

nodetool ring/netstats and/or grepping the log for the
completed-bootstrap message.

=Rob

Re: [Cassandra] Running node tool cleanup

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Jun 20, 2013 at 12:01 AM, Emalayan Vairavanathan
<sv...@yahoo.com> wrote:
> 1) What will happen if I run nodetool cleanup immediately after bringing a
> new node up (i.e. before the key migration process is completed) ?
>         Will it cause some race conditions ? Or will it result in some part
> of the space never be reclaimed ?

As I understand it, the new node isn't responsible for the range until
the migration process is complete, so I presume cleanup will do
nothing in this case. This is so the old node can continue to serve
the range during the bootstrap, and in case of bootstrap failure.

> 2) After adding a new machine, how can I make sure that the key migration is
> completed ? Should I run nodetool netstats on all the nodes ? Is there any
> better way ?

nodetool ring/netstats and/or grepping the log for the
completed-bootstrap message.

=Rob

[Cassandra] Running node tool cleanup

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Hi All,

1) What will happen if I run nodetool cleanup immediately after bringing a new node up (i.e. before the key migration process is completed) ?

        Will it cause some race conditions ? Or will it result in some part of the space never be reclaimed ?

2) After adding a new machine, how can I make sure that the key migration is completed ? Should I run nodetool netstats on all the nodes ? Is there any better way ?

Thank you
Emalayan 

Re: [Cassandra] Expanding a Cassandra cluster

Posted by aaron morton <aa...@thelastpickle.com>.
> 1) Is there any implication in running nodetool repair immediately after bringing a new node up (before key migration process is completed) ?
>         Will it cause some race conditions ? Or will it result in some part of the space never be reclaimed ?
Repair will only be concerned with data that the node is replica for. And cleanup is only concerned with data that the node is no longer a replica for. 
AFAIK they should be able to run concurrently, but I would avoid it incase there are some edge cases. 

> 2) How can I figure out the status of key migration in Cassandra?
Not sure what you mean by key migration. 
If you are talking about a token move or node bootstrap you can get some idea from nodetool compactionstats and nodetool netstats. 

FWIW I think nodetool cleanup is less aggressive than repair. Repair reads all the data and creates a hash, cleanup just reads it from one file and writes a new file dropping rows that no longer belong. It's probably uses less CPU than compaction as it does not merge row fragments. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/06/2013, at 10:48 AM, Emalayan Vairavanathan <sv...@yahoo.com> wrote:

> Thank you all.
> 
> I have two more question.
> 
> 1) Is there any implication in running nodetool repair immediately after bringing a new node up (before key migration process is completed) ?
>         Will it cause some race conditions ? Or will it result in some part of the space never be reclaimed ?
> 
> 2) How can I figure out the status of key migration in Cassandra?
> 
> Thank you
> Emalayan 
> 
> From: Richard Low <ri...@wentnet.com>
> To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com> 
> Sent: Tuesday, 18 June 2013 12:11 AM
> Subject: Re: [Cassandra] Expanding a Cassandra cluster
> 
> On 10 June 2013 22:00, Emalayan Vairavanathan <sv...@yahoo.com> wrote:
> 
>                    b) Will Cassandra automatically take care of removing obsolete keys in future ?
> 
> In a future version Cassandra should automatically clean up for you:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-5051
> 
> Right now though you have to run cleanup eventually or the space will never be reclaimed.
> 
> Richard.
> 
> 


Re: [Cassandra] Expanding a Cassandra cluster

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Thank you all.

I have two more question.

1) Is there any implication in running nodetool repair immediately after bringing a new node up (before key migration process is completed) ?

        Will it cause some race conditions ? Or will it result in some part of the space never be reclaimed ?

2) How can I figure out the status of key migration in Cassandra?

Thank you
Emalayan 


________________________________
 From: Richard Low <ri...@wentnet.com>
To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com> 
Sent: Tuesday, 18 June 2013 12:11 AM
Subject: Re: [Cassandra] Expanding a Cassandra cluster
 


On 10 June 2013 22:00, Emalayan Vairavanathan <sv...@yahoo.com> wrote:


                   b) Will Cassandra automatically take care of removing obsolete keys in future ?

In a future version Cassandra should automatically clean up for you:

https://issues.apache.org/jira/browse/CASSANDRA-5051

Right now though you have to run cleanup eventually or the space will never be reclaimed.

Richard.

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Richard Low <ri...@wentnet.com>.
On 10 June 2013 22:00, Emalayan Vairavanathan <sv...@yahoo.com> wrote:

                    b) Will Cassandra automatically take care of removing
> obsolete keys in future ?
>

In a future version Cassandra should automatically clean up for you:

https://issues.apache.org/jira/browse/CASSANDRA-5051

Right now though you have to run cleanup eventually or the space will never
be reclaimed.

Richard.

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
We run it concurrently each RF nodes (If RF = 3, we run it on 3 waves). If
the node is busy cleaning up, then the client will time out and ask to an
other node having a copy of the data and that is not being cleaned up.

"Will node tool cleanup consume lot of IO and CPU even though there is
nothing to clean"

Yes, I think so, since you have to check that you have nothing to clean...
I think there is no case that need a regular cleanup anyway.


2013/6/12 Michal Michalski <mi...@opera.com>

>  What will happen if I add nodetool cleanup to run periodically (similar
>> to nodetool repair) ? Will node tool cleanup consume lot of IO and CPU even
>> though there is nothing to clean ?
>>
>
> Why would you need doing so?
>
> M.
>
>
>
>> Thank you
>> Emalayan
>>
>>
>> ______________________________**__
>>   From: Robert Coli <rc...@eventbrite.com>
>> To: user@cassandra.apache.org; Emalayan Vairavanathan <
>> svemalayan@yahoo.com>
>> Sent: Monday, 10 June 2013 5:15 PM
>> Subject: Re: [Cassandra] Expanding a Cassandra cluster
>>
>>
>> On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
>> <sv...@yahoo.com> wrote:
>>
>>> I suspect that nodetool cleanup is IO intensive. So running nodetool
>>> cleanup
>>> concurrently on the entire cluster may have a significantly impact the IO
>>> performance of applications.
>>>
>>
>> cleanup is a specific kind of compaction, and as such respects the
>> compaction throughput throttle.
>>
>> The compaction throughput throttle is designed to prevent compaction
>> from negatively impacting the performance of things-not-compaction. If
>> you notice that cleanup compaction on all or most nodes consumes too
>> much i/o, reduce the throttle value.
>>
>> =Rob
>>
>>
>

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Michal Michalski <mi...@opera.com>.
> What will happen if I add nodetool cleanup to run periodically (similar to nodetool repair) ? Will node tool cleanup consume lot of IO and CPU even though there is nothing to clean ?

Why would you need doing so?

M.

>
> Thank you
> Emalayan
>
>
> ________________________________
>   From: Robert Coli <rc...@eventbrite.com>
> To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com>
> Sent: Monday, 10 June 2013 5:15 PM
> Subject: Re: [Cassandra] Expanding a Cassandra cluster
>
>
> On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
> <sv...@yahoo.com> wrote:
>> I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup
>> concurrently on the entire cluster may have a significantly impact the IO
>> performance of applications.
>
> cleanup is a specific kind of compaction, and as such respects the
> compaction throughput throttle.
>
> The compaction throughput throttle is designed to prevent compaction
> from negatively impacting the performance of things-not-compaction. If
> you notice that cleanup compaction on all or most nodes consumes too
> much i/o, reduce the throttle value.
>
> =Rob
>


Re: [Cassandra] Expanding a Cassandra cluster

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Thank you Robert and all others who replied to my question.

What will happen if I add nodetool cleanup to run periodically (similar to nodetool repair) ? Will node tool cleanup consume lot of IO and CPU even though there is nothing to clean ?

Thank you
Emalayan 


________________________________
 From: Robert Coli <rc...@eventbrite.com>
To: user@cassandra.apache.org; Emalayan Vairavanathan <sv...@yahoo.com> 
Sent: Monday, 10 June 2013 5:15 PM
Subject: Re: [Cassandra] Expanding a Cassandra cluster
 

On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
<sv...@yahoo.com> wrote:
> I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup
> concurrently on the entire cluster may have a significantly impact the IO
> performance of applications.

cleanup is a specific kind of compaction, and as such respects the
compaction throughput throttle.

The compaction throughput throttle is designed to prevent compaction
from negatively impacting the performance of things-not-compaction. If
you notice that cleanup compaction on all or most nodes consumes too
much i/o, reduce the throttle value.

=Rob

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Jun 10, 2013 at 3:13 PM, Emalayan Vairavanathan
<sv...@yahoo.com> wrote:
> I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup
> concurrently on the entire cluster may have a significantly impact the IO
> performance of applications.

cleanup is a specific kind of compaction, and as such respects the
compaction throughput throttle.

The compaction throughput throttle is designed to prevent compaction
from negatively impacting the performance of things-not-compaction. If
you notice that cleanup compaction on all or most nodes consumes too
much i/o, reduce the throttle value.

=Rob

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Emalayan Vairavanathan <sv...@yahoo.com>.
Thank you Edward.

I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup concurrently on the entire cluster may have a significantly impact the IO  performance of applications.

Apart from this, do you see any other implications on running the nodetool cleanup concurrently on the entire cluster ?

Thank you
Emalayan


________________________________
 From: Edward Capriolo <ed...@gmail.com>
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>; Emalayan Vairavanathan <sv...@yahoo.com> 
Sent: Monday, 10 June 2013 2:53 PM
Subject: Re: [Cassandra] Expanding a Cassandra cluster
 


You eventually should run cleanup to remove data no longer needed on the node. However it does not need to be run quickly after a join. You can run it when you get around to it. I would run it on a few nodes at a time until they are all cleaned up.




On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan <sv...@yahoo.com> wrote:

Hi All,
>
>
>Datastax manual suggests that during a Cassandra cluster expansion, an administrator has to run nodetool cleanup on each of the previously existing Cassandra nodes to remove the keys that are no longer belonging to those nodes. Further the manual says that thenodetool cleanup  task should be run sequentially on the existing Cassandra nodes.
>
>
>Reference: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity
>
>
>Here is my problem: I have a very large Cassandra cluster with 100s of nodes and running nodetool cleanup sequentially will take a long time to finish. 
>
>
> Questions: a) So can someone tell me  about the implications of running the nodetool cleanup concurrently on the entire cluster ?
>                   b) Will Cassandra automatically take care of removing obsolete keys in future ?
>
>
>
>
>Thank youEmalayan 

Re: [Cassandra] Expanding a Cassandra cluster

Posted by Edward Capriolo <ed...@gmail.com>.
You eventually should run cleanup to remove data no longer needed on the
node. However it does not need to be run quickly after a join. You can run
it when you get around to it. I would run it on a few nodes at a time until
they are all cleaned up.


On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan <
svemalayan@yahoo.com> wrote:

> Hi All,
>
> Datastax manual suggests that during a Cassandra cluster expansion, an
> administrator has to run nodetool cleanup on each of the previously
> existing Cassandra nodes to remove the keys that are no longer belonging to
> those nodes. Further the manual says that the nodetool cleanup  task
> should be run sequentially on the existing Cassandra nodes.
>
> Reference:
> http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity
>
> Here is my problem: I have a very large Cassandra cluster with 100s of
> nodes and running nodetool cleanup sequentially will take a long time to
> finish.
>
>  Questions: a) So can someone tell me  about the implications of running
> the nodetool cleanup concurrently on the entire cluster ?
>                    b) Will Cassandra automatically take care of removing
> obsolete keys in future ?
>
>
> Thank you
> Emalayan
>