You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2015/10/08 06:06:08 UTC

Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool cleanup,
is excessive data transferred when I add the 6th node?  IE do the existing
nodes send more data to the 6th node?

the documentation is unclear.  It sounds like the biggest problem is that
the existing data causes things to become unbalanced due to "load" computed
wrong".

but I also think that the excessive data will be removed in the next major
compaction and that nodetool cleanup just triggers a major compaction.

Is my hypothesis correct?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Re: Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

Posted by Kevin Burton <bu...@spinn3r.com>.
vnodes ... of course!

On Wed, Oct 7, 2015 at 9:09 PM, Sebastian Estevez <
sebastian.estevez@datastax.com> wrote:

> vnodes or single tokens?
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Thu, Oct 8, 2015 at 12:06 AM, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool
>> cleanup, is excessive data transferred when I add the 6th node?  IE do the
>> existing nodes send more data to the 6th node?
>>
>> the documentation is unclear.  It sounds like the biggest problem is that
>> the existing data causes things to become unbalanced due to "load" computed
>> wrong".
>>
>> but I also think that the excessive data will be removed in the next
>> major compaction and that nodetool cleanup just triggers a major compaction.
>>
>> Is my hypothesis correct?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Re: Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

Posted by Sebastian Estevez <se...@datastax.com>.
vnodes or single tokens?

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Oct 8, 2015 at 12:06 AM, Kevin Burton <bu...@spinn3r.com> wrote:

> Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool
> cleanup, is excessive data transferred when I add the 6th node?  IE do the
> existing nodes send more data to the 6th node?
>
> the documentation is unclear.  It sounds like the biggest problem is that
> the existing data causes things to become unbalanced due to "load" computed
> wrong".
>
> but I also think that the excessive data will be removed in the next major
> compaction and that nodetool cleanup just triggers a major compaction.
>
> Is my hypothesis correct?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>

Re: Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Oct 7, 2015 at 9:06 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool
> cleanup, is excessive data transferred when I add the 6th node?  IE do the
> existing nodes send more data to the 6th node?
>

No. Streaming only streams ranges which are currently owned by the source
and will be owned by the target.

https://issues.apache.org/jira/browse/CASSANDRA-7764

Has some details on the type of edge cases one is exposed to if one does
not run cleanup; mostly they involve moving a range away from a node and
then back onto it.


> but I also think that the excessive data will be removed in the next major
> compaction and that nodetool cleanup just triggers a major compaction.
>

Nothing* removes data-which-doesn't-belong-on-the-node-it's-on but cleanup
compaction (*or scrub).

=Rob