You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by sai krishnam raju potturi <ps...@gmail.com> on 2015/10/08 23:38:51 UTC

Re : Nodetool Cleanup on multiple nodes in parallel

hi;
   our cassandra cluster currently uses DSE 4.6. The underlying cassandra
version is 2.0.14.

We are planning on adding multiple nodes to one of our datacenters. This
requires "nodetool cleanup". The "nodetool cleanup" operation takes around
45 mins for each node.

Datastax documentation recommends running "nodetool cleanup" for one node
at a time. That would be really long, owing to the size of our datacenter.

If we were to divert the read and write traffic away from a particular
datacenter, could we run "cleanup" on multiple nodes in parallel for that
datacenter??

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html


thanks
Sai

Re: Re : Nodetool Cleanup on multiple nodes in parallel

Posted by sai krishnam raju potturi <ps...@gmail.com>.
thanks Jonathan. I see a advantage in doing it one AZ or rack at a time.

On Thu, Oct 8, 2015 at 6:41 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> My hunch is the bigger your cluster the less impact it will have, as each
> node takes part in smaller and smaller % of total queries.  Considering
> that compaction is always happening, I'd wager if you've got a big cluster
> (as you say you do) you'll probably be ok running several cleanups at a
> time.
>
> I'd say start one, see how your perf is impacted (if at all) and go from
> there.
>
> If you're running a proper snitch you could probably do an entire rack /
> AZ at a time.
>
>
> On Thu, Oct 8, 2015 at 3:08 PM sai krishnam raju potturi <
> pskraju88@gmail.com> wrote:
>
>> We plan to do it during non-peak hours when customer traffic is less.
>> That sums up to 10 nodes a day, which is concerning as we have other data
>> centers to be expanded eventually.
>>
>> Since cleanup is similar to compaction, which is CPU intensive and will
>> effect reads  if this data center were to serve traffic. Is running cleanup
>> in parallel advisable??
>>
>> On Thu, Oct 8, 2015, 17:53 Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>>> Unless you're close to running out of disk space, what's the harm in it
>>> taking a while?  How big is your DC?  At 45 min per node, you can do 32
>>> nodes a day.  Diverting traffic away from a DC just to run cleanup feels
>>> like overkill to me.
>>>
>>>
>>>
>>> On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
>>> pskraju88@gmail.com> wrote:
>>>
>>>> hi;
>>>>    our cassandra cluster currently uses DSE 4.6. The underlying
>>>> cassandra version is 2.0.14.
>>>>
>>>> We are planning on adding multiple nodes to one of our datacenters.
>>>> This requires "nodetool cleanup". The "nodetool cleanup" operation
>>>> takes around 45 mins for each node.
>>>>
>>>> Datastax documentation recommends running "nodetool cleanup" for one
>>>> node at a time. That would be really long, owing to the size of our
>>>> datacenter.
>>>>
>>>> If we were to divert the read and write traffic away from a particular
>>>> datacenter, could we run "cleanup" on multiple nodes in parallel for
>>>> that datacenter??
>>>>
>>>>
>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>>>
>>>>
>>>> thanks
>>>> Sai
>>>>
>>>

Re: Re : Nodetool Cleanup on multiple nodes in parallel

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
My hunch is the bigger your cluster the less impact it will have, as each
node takes part in smaller and smaller % of total queries.  Considering
that compaction is always happening, I'd wager if you've got a big cluster
(as you say you do) you'll probably be ok running several cleanups at a
time.

I'd say start one, see how your perf is impacted (if at all) and go from
there.

If you're running a proper snitch you could probably do an entire rack / AZ
at a time.


On Thu, Oct 8, 2015 at 3:08 PM sai krishnam raju potturi <
pskraju88@gmail.com> wrote:

> We plan to do it during non-peak hours when customer traffic is less. That
> sums up to 10 nodes a day, which is concerning as we have other data
> centers to be expanded eventually.
>
> Since cleanup is similar to compaction, which is CPU intensive and will
> effect reads  if this data center were to serve traffic. Is running cleanup
> in parallel advisable??
>
> On Thu, Oct 8, 2015, 17:53 Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>> Unless you're close to running out of disk space, what's the harm in it
>> taking a while?  How big is your DC?  At 45 min per node, you can do 32
>> nodes a day.  Diverting traffic away from a DC just to run cleanup feels
>> like overkill to me.
>>
>>
>>
>> On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
>> pskraju88@gmail.com> wrote:
>>
>>> hi;
>>>    our cassandra cluster currently uses DSE 4.6. The underlying
>>> cassandra version is 2.0.14.
>>>
>>> We are planning on adding multiple nodes to one of our datacenters. This
>>> requires "nodetool cleanup". The "nodetool cleanup" operation takes
>>> around 45 mins for each node.
>>>
>>> Datastax documentation recommends running "nodetool cleanup" for one
>>> node at a time. That would be really long, owing to the size of our
>>> datacenter.
>>>
>>> If we were to divert the read and write traffic away from a particular
>>> datacenter, could we run "cleanup" on multiple nodes in parallel for
>>> that datacenter??
>>>
>>>
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>>
>>>
>>> thanks
>>> Sai
>>>
>>

Re: Re : Nodetool Cleanup on multiple nodes in parallel

Posted by sai krishnam raju potturi <ps...@gmail.com>.
We plan to do it during non-peak hours when customer traffic is less. That
sums up to 10 nodes a day, which is concerning as we have other data
centers to be expanded eventually.

Since cleanup is similar to compaction, which is CPU intensive and will
effect reads  if this data center were to serve traffic. Is running cleanup
in parallel advisable??

On Thu, Oct 8, 2015, 17:53 Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Unless you're close to running out of disk space, what's the harm in it
> taking a while?  How big is your DC?  At 45 min per node, you can do 32
> nodes a day.  Diverting traffic away from a DC just to run cleanup feels
> like overkill to me.
>
>
>
> On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
> pskraju88@gmail.com> wrote:
>
>> hi;
>>    our cassandra cluster currently uses DSE 4.6. The underlying cassandra
>> version is 2.0.14.
>>
>> We are planning on adding multiple nodes to one of our datacenters. This
>> requires "nodetool cleanup". The "nodetool cleanup" operation takes
>> around 45 mins for each node.
>>
>> Datastax documentation recommends running "nodetool cleanup" for one
>> node at a time. That would be really long, owing to the size of our
>> datacenter.
>>
>> If we were to divert the read and write traffic away from a particular
>> datacenter, could we run "cleanup" on multiple nodes in parallel for
>> that datacenter??
>>
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>
>>
>> thanks
>> Sai
>>
>

Re: Re : Nodetool Cleanup on multiple nodes in parallel

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Unless you're close to running out of disk space, what's the harm in it
taking a while?  How big is your DC?  At 45 min per node, you can do 32
nodes a day.  Diverting traffic away from a DC just to run cleanup feels
like overkill to me.



On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
pskraju88@gmail.com> wrote:

> hi;
>    our cassandra cluster currently uses DSE 4.6. The underlying cassandra
> version is 2.0.14.
>
> We are planning on adding multiple nodes to one of our datacenters. This
> requires "nodetool cleanup". The "nodetool cleanup" operation takes
> around 45 mins for each node.
>
> Datastax documentation recommends running "nodetool cleanup" for one node
> at a time. That would be really long, owing to the size of our
> datacenter.
>
> If we were to divert the read and write traffic away from a particular
> datacenter, could we run "cleanup" on multiple nodes in parallel for that
> datacenter??
>
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>
>
> thanks
> Sai
>