You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by John Watson <jo...@disqus.com> on 2013/04/25 21:57:50 UTC

Adding nodes in 1.2 with vnodes requires huge disks

After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running
upgradesstables, I figured it would be safe to start adding nodes to the
cluster. Guess not?

It seems when new nodes join, they are streamed *all* sstables in the
cluster.

https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png

The gray the line machine ran out disk space and for some reason cascaded
into errors in the cluster about 'no host id' when trying to store hints
for it (even though it hadn't joined yet).
The purple line machine, I just stopped the joining process because the
main cluster was dropping mutation messages at this point on a few nodes
(and it still had dozens of sstables to stream.)

I followed this:
http://www.datastax.com/docs/1.2/operations/add_replace_nodes

Is there something missing in that documentation?

Thanks,

John

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by John Watson <jo...@disqus.com>.

Opened a ticket:

https://issues.apache.org/jira/browse/CASSANDRA-5525


On Mon, Apr 29, 2013 at 2:24 AM, aaron morton <aa...@thelastpickle.com>wrote:

> is this understanding correct "we had a 12 node cluster with 256 vnodes on
> each node (upgraded from 1.1), we added two additional nodes that streamed
> so much data (600+Gb when other nodes had 150-200GB) during the joining
> phase that they filled their local disks and had to be killed" ?
>
> Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and
> update the thread with the ticket number.
>
> Can you show the output from nodetool status so we can get a feel for the
> ring?
> Can you include the logs from one of the nodes that failed to join ?
>
> Thanks
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/04/2013, at 10:01 AM, John Watson <jo...@disqus.com> wrote:
>
> On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>>  We're going to try running a shuffle before adding a new node again...
>>> maybe that will help
>>>
>> I don't think  hurt but I doubt it will help.
>>
>
> We had to bail on shuffle since we need to add capacity ASAP and not in 20
> days.
>
>
>>
>>    It seems when new nodes join, they are streamed *all* sstables in the
>>>> cluster.
>>>>
>>>>
>>>>
>>>> How many nodes did you join, what was the num_tokens ?
>> Did you notice streaming from all nodes (in the logs) or are you saying
>> this in response to the cluster load increasing ?
>>
>>
> Was only adding 2 nodes at the time (planning to add a total of 12.)
> Starting with a cluster of 12, but now 11 since 1 node entered some weird
> state when one of the new nodes ran out disk space.
> num_tokens is set to 256 on all nodes.
> Yes, nearly all current nodes were streaming to the new ones (which was
> great until disk space was an issue.)
>
>>     The purple line machine, I just stopped the joining process because
>>>> the main cluster was dropping mutation messages at this point on a few
>>>> nodes (and it still had dozens of sstables to stream.)
>>>>
>>>> Which were the new nodes ?
>> Can you show the output from nodetool status?
>>
>>
> The new nodes are the purple and gray lines above all the others.
>
> nodetool status doesn't show joining nodes. I think I saw a bug already
> filed for this but I can't seem to find it.
>
>
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 27/04/2013, at 9:35 AM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>
>> I believe that "nodetool rebuild" is used to add a new datacenter, not
>> just a new host to an existing cluster.  Is that what you ran to add the
>> node?
>>
>> -Bryan
>>
>>
>>
>> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:
>>
>>> Small relief we're not the only ones that had this issue.
>>>
>>> We're going to try running a shuffle before adding a new node again...
>>> maybe that will help
>>>
>>> - John
>>>
>>>
>>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <
>>> fsobral@igcorp.com.br> wrote:
>>>
>>>> I am using the same version and observed something similar.
>>>>
>>>> I've added a new node, but the instructions from Datastax did not work
>>>> for me. Then I ran "nodetool rebuild" on the new node. After finished this
>>>> command, it contained two times the load of the other nodes. Even when I
>>>> ran "nodetool cleanup" on the older nodes, the situation was the same.
>>>>
>>>> The problem only seemed to disappear when "nodetool repair" was applied
>>>> to all nodes.
>>>>
>>>> Regards,
>>>> Francisco Sobral.
>>>>
>>>>
>>>>
>>>>
>>>> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>>>>
>>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and
>>>> running upgradesstables, I figured it would be safe to start adding nodes
>>>> to the cluster. Guess not?
>>>>
>>>> It seems when new nodes join, they are streamed *all* sstables in the
>>>> cluster.
>>>>
>>>>
>>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>>>
>>>> The gray the line machine ran out disk space and for some reason
>>>> cascaded into errors in the cluster about 'no host id' when trying to store
>>>> hints for it (even though it hadn't joined yet).
>>>> The purple line machine, I just stopped the joining process because the
>>>> main cluster was dropping mutation messages at this point on a few nodes
>>>> (and it still had dozens of sstables to stream.)
>>>>
>>>> I followed this:
>>>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>>>
>>>> Is there something missing in that documentation?
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by aaron morton <aa...@thelastpickle.com>.

is this understanding correct "we had a 12 node cluster with 256 vnodes on each node (upgraded from 1.1), we added two additional nodes that streamed so much data (600+Gb when other nodes had 150-200GB) during the joining phase that they filled their local disks and had to be killed" ?

Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the thread with the ticket number.

Can you show the output from nodetool status so we can get a feel for the ring?
Can you include the logs from one of the nodes that failed to join ? 

Thanks

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/04/2013, at 10:01 AM, John Watson <jo...@disqus.com> wrote:

> On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> We're going to try running a shuffle before adding a new node again... maybe that will help
> 
> I don't think  hurt but I doubt it will help. 
> 
> We had to bail on shuffle since we need to add capacity ASAP and not in 20 days.
>  
> 
>>> It seems when new nodes join, they are streamed *all* sstables in the cluster.
> 
>> 
> 
> How many nodes did you join, what was the num_tokens ? 
> Did you notice streaming from all nodes (in the logs) or are you saying this in response to the cluster load increasing ? 
> 
>  
> Was only adding 2 nodes at the time (planning to add a total of 12.) Starting with a cluster of 12, but now 11 since 1 node entered some weird state when one of the new nodes ran out disk space.
> num_tokens is set to 256 on all nodes.
> Yes, nearly all current nodes were streaming to the new ones (which was great until disk space was an issue.)
>>> The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.)
> Which were the new nodes ?
> Can you show the output from nodetool status?
> 
> 
> The new nodes are the purple and gray lines above all the others.
> 
> nodetool status doesn't show joining nodes. I think I saw a bug already filed for this but I can't seem to find it.
>  
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/04/2013, at 9:35 AM, Bryan Talbot <bt...@aeriagames.com> wrote:
> 
>> I believe that "nodetool rebuild" is used to add a new datacenter, not just a new host to an existing cluster.  Is that what you ran to add the node?
>> 
>> -Bryan
>> 
>> 
>> 
>> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:
>> Small relief we're not the only ones that had this issue.
>> 
>> We're going to try running a shuffle before adding a new node again... maybe that will help
>> 
>> - John
>> 
>> 
>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <fs...@igcorp.com.br> wrote:
>> I am using the same version and observed something similar.
>> 
>> I've added a new node, but the instructions from Datastax did not work for me. Then I ran "nodetool rebuild" on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran "nodetool cleanup" on the older nodes, the situation was the same.
>> 
>> The problem only seemed to disappear when "nodetool repair" was applied to all nodes.
>> 
>> Regards,
>> Francisco Sobral.
>> 
>> 
>> 
>> 
>> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>> 
>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not?
>>> 
>>> It seems when new nodes join, they are streamed *all* sstables in the cluster.
>>> 
>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>> 
>>> The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet).
>>> The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.)
>>> 
>>> I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>> 
>>> Is there something missing in that documentation?
>>> 
>>> Thanks,
>>> 
>>> John
>> 
>> 
>> 
> 
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by John Watson <jo...@disqus.com>.

They were all restarted a couple times after adding 'num_tokens: 256' to
cassandra.yaml.

Yes and nodetool ring became 'unusable' due to all the new tokens.


On Mon, Apr 29, 2013 at 10:24 AM, Sam Overton <sa...@acunu.com> wrote:

> Did you update num_tokens on the existing hosts and restart them, before
> you tried bootstrapping in the new node? If the new node tried to stream
> all the data in the cluster then this would be consistent with you having
> missed that step.
>
> You should see "Calculating new tokens" in the logs of the existing hosts
> if you performed that step correctly, and "nodetool ring" should show that
> the existing hosts each have 256 tokens which are contiguous in the ring.
>
> If you missed this step then the new node will be taking 256 tokens in a
> ring with only N tokens (1 per existing host) and so will end up with
> 256/(256+N) of the data (almost all of it).
>
>
>
> On 28 April 2013 23:01, John Watson <jo...@disqus.com> wrote:
>
>> On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>>  We're going to try running a shuffle before adding a new node again...
>>>> maybe that will help
>>>>
>>> I don't think  hurt but I doubt it will help.
>>>
>>
>> We had to bail on shuffle since we need to add capacity ASAP and not in
>> 20 days.
>>
>>
>>>
>>>    It seems when new nodes join, they are streamed *all* sstables in
>>>>> the cluster.
>>>>>
>>>>>
>>>>>
>>>>> How many nodes did you join, what was the num_tokens ?
>>> Did you notice streaming from all nodes (in the logs) or are you saying
>>> this in response to the cluster load increasing ?
>>>
>>>
>> Was only adding 2 nodes at the time (planning to add a total of 12.)
>> Starting with a cluster of 12, but now 11 since 1 node entered some weird
>> state when one of the new nodes ran out disk space.
>> num_tokens is set to 256 on all nodes.
>> Yes, nearly all current nodes were streaming to the new ones (which was
>> great until disk space was an issue.)
>>
>>>     The purple line machine, I just stopped the joining process because
>>>>> the main cluster was dropping mutation messages at this point on a few
>>>>> nodes (and it still had dozens of sstables to stream.)
>>>>>
>>>>> Which were the new nodes ?
>>> Can you show the output from nodetool status?
>>>
>>>
>> The new nodes are the purple and gray lines above all the others.
>>
>> nodetool status doesn't show joining nodes. I think I saw a bug already
>> filed for this but I can't seem to find it.
>>
>>
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 27/04/2013, at 9:35 AM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>>
>>> I believe that "nodetool rebuild" is used to add a new datacenter, not
>>> just a new host to an existing cluster.  Is that what you ran to add the
>>> node?
>>>
>>> -Bryan
>>>
>>>
>>>
>>> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:
>>>
>>>> Small relief we're not the only ones that had this issue.
>>>>
>>>> We're going to try running a shuffle before adding a new node again...
>>>> maybe that will help
>>>>
>>>> - John
>>>>
>>>>
>>>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <
>>>> fsobral@igcorp.com.br> wrote:
>>>>
>>>>> I am using the same version and observed something similar.
>>>>>
>>>>> I've added a new node, but the instructions from Datastax did not work
>>>>> for me. Then I ran "nodetool rebuild" on the new node. After finished this
>>>>> command, it contained two times the load of the other nodes. Even when I
>>>>> ran "nodetool cleanup" on the older nodes, the situation was the same.
>>>>>
>>>>> The problem only seemed to disappear when "nodetool repair" was
>>>>> applied to all nodes.
>>>>>
>>>>> Regards,
>>>>> Francisco Sobral.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>>>>>
>>>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and
>>>>> running upgradesstables, I figured it would be safe to start adding nodes
>>>>> to the cluster. Guess not?
>>>>>
>>>>> It seems when new nodes join, they are streamed *all* sstables in the
>>>>> cluster.
>>>>>
>>>>>
>>>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>>>>
>>>>> The gray the line machine ran out disk space and for some reason
>>>>> cascaded into errors in the cluster about 'no host id' when trying to store
>>>>> hints for it (even though it hadn't joined yet).
>>>>> The purple line machine, I just stopped the joining process because
>>>>> the main cluster was dropping mutation messages at this point on a few
>>>>> nodes (and it still had dozens of sstables to stream.)
>>>>>
>>>>> I followed this:
>>>>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>>>>
>>>>> Is there something missing in that documentation?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Sam Overton
> Acunu | http://www.acunu.com | @acunu
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by Sam Overton <sa...@acunu.com>.

Did you update num_tokens on the existing hosts and restart them, before
you tried bootstrapping in the new node? If the new node tried to stream
all the data in the cluster then this would be consistent with you having
missed that step.

You should see "Calculating new tokens" in the logs of the existing hosts
if you performed that step correctly, and "nodetool ring" should show that
the existing hosts each have 256 tokens which are contiguous in the ring.

If you missed this step then the new node will be taking 256 tokens in a
ring with only N tokens (1 per existing host) and so will end up with
256/(256+N) of the data (almost all of it).



On 28 April 2013 23:01, John Watson <jo...@disqus.com> wrote:

> On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>>  We're going to try running a shuffle before adding a new node again...
>>> maybe that will help
>>>
>> I don't think  hurt but I doubt it will help.
>>
>
> We had to bail on shuffle since we need to add capacity ASAP and not in 20
> days.
>
>
>>
>>    It seems when new nodes join, they are streamed *all* sstables in the
>>>> cluster.
>>>>
>>>>
>>>>
>>>> How many nodes did you join, what was the num_tokens ?
>> Did you notice streaming from all nodes (in the logs) or are you saying
>> this in response to the cluster load increasing ?
>>
>>
> Was only adding 2 nodes at the time (planning to add a total of 12.)
> Starting with a cluster of 12, but now 11 since 1 node entered some weird
> state when one of the new nodes ran out disk space.
> num_tokens is set to 256 on all nodes.
> Yes, nearly all current nodes were streaming to the new ones (which was
> great until disk space was an issue.)
>
>>     The purple line machine, I just stopped the joining process because
>>>> the main cluster was dropping mutation messages at this point on a few
>>>> nodes (and it still had dozens of sstables to stream.)
>>>>
>>>> Which were the new nodes ?
>> Can you show the output from nodetool status?
>>
>>
> The new nodes are the purple and gray lines above all the others.
>
> nodetool status doesn't show joining nodes. I think I saw a bug already
> filed for this but I can't seem to find it.
>
>
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 27/04/2013, at 9:35 AM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>
>> I believe that "nodetool rebuild" is used to add a new datacenter, not
>> just a new host to an existing cluster.  Is that what you ran to add the
>> node?
>>
>> -Bryan
>>
>>
>>
>> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:
>>
>>> Small relief we're not the only ones that had this issue.
>>>
>>> We're going to try running a shuffle before adding a new node again...
>>> maybe that will help
>>>
>>> - John
>>>
>>>
>>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <
>>> fsobral@igcorp.com.br> wrote:
>>>
>>>> I am using the same version and observed something similar.
>>>>
>>>> I've added a new node, but the instructions from Datastax did not work
>>>> for me. Then I ran "nodetool rebuild" on the new node. After finished this
>>>> command, it contained two times the load of the other nodes. Even when I
>>>> ran "nodetool cleanup" on the older nodes, the situation was the same.
>>>>
>>>> The problem only seemed to disappear when "nodetool repair" was applied
>>>> to all nodes.
>>>>
>>>> Regards,
>>>> Francisco Sobral.
>>>>
>>>>
>>>>
>>>>
>>>> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>>>>
>>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and
>>>> running upgradesstables, I figured it would be safe to start adding nodes
>>>> to the cluster. Guess not?
>>>>
>>>> It seems when new nodes join, they are streamed *all* sstables in the
>>>> cluster.
>>>>
>>>>
>>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>>>
>>>> The gray the line machine ran out disk space and for some reason
>>>> cascaded into errors in the cluster about 'no host id' when trying to store
>>>> hints for it (even though it hadn't joined yet).
>>>> The purple line machine, I just stopped the joining process because the
>>>> main cluster was dropping mutation messages at this point on a few nodes
>>>> (and it still had dozens of sstables to stream.)
>>>>
>>>> I followed this:
>>>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>>>
>>>> Is there something missing in that documentation?
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Sam Overton
Acunu | http://www.acunu.com | @acunu

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by John Watson <jo...@disqus.com>.

On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com>wrote:

>  We're going to try running a shuffle before adding a new node again...
>> maybe that will help
>>
> I don't think  hurt but I doubt it will help.
>

We had to bail on shuffle since we need to add capacity ASAP and not in 20
days.


>
>    It seems when new nodes join, they are streamed *all* sstables in the
>>> cluster.
>>>
>>>
>>>
>>> How many nodes did you join, what was the num_tokens ?
> Did you notice streaming from all nodes (in the logs) or are you saying
> this in response to the cluster load increasing ?
>
>
Was only adding 2 nodes at the time (planning to add a total of 12.)
Starting with a cluster of 12, but now 11 since 1 node entered some weird
state when one of the new nodes ran out disk space.
num_tokens is set to 256 on all nodes.
Yes, nearly all current nodes were streaming to the new ones (which was
great until disk space was an issue.)

>     The purple line machine, I just stopped the joining process because
>>> the main cluster was dropping mutation messages at this point on a few
>>> nodes (and it still had dozens of sstables to stream.)
>>>
>>> Which were the new nodes ?
> Can you show the output from nodetool status?
>
>
The new nodes are the purple and gray lines above all the others.

nodetool status doesn't show joining nodes. I think I saw a bug already
filed for this but I can't seem to find it.


>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 27/04/2013, at 9:35 AM, Bryan Talbot <bt...@aeriagames.com> wrote:
>
> I believe that "nodetool rebuild" is used to add a new datacenter, not
> just a new host to an existing cluster.  Is that what you ran to add the
> node?
>
> -Bryan
>
>
>
> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:
>
>> Small relief we're not the only ones that had this issue.
>>
>> We're going to try running a shuffle before adding a new node again...
>> maybe that will help
>>
>> - John
>>
>>
>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <
>> fsobral@igcorp.com.br> wrote:
>>
>>> I am using the same version and observed something similar.
>>>
>>> I've added a new node, but the instructions from Datastax did not work
>>> for me. Then I ran "nodetool rebuild" on the new node. After finished this
>>> command, it contained two times the load of the other nodes. Even when I
>>> ran "nodetool cleanup" on the older nodes, the situation was the same.
>>>
>>> The problem only seemed to disappear when "nodetool repair" was applied
>>> to all nodes.
>>>
>>> Regards,
>>> Francisco Sobral.
>>>
>>>
>>>
>>>
>>> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>>>
>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and
>>> running upgradesstables, I figured it would be safe to start adding nodes
>>> to the cluster. Guess not?
>>>
>>> It seems when new nodes join, they are streamed *all* sstables in the
>>> cluster.
>>>
>>>
>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>>
>>> The gray the line machine ran out disk space and for some reason
>>> cascaded into errors in the cluster about 'no host id' when trying to store
>>> hints for it (even though it hadn't joined yet).
>>> The purple line machine, I just stopped the joining process because the
>>> main cluster was dropping mutation messages at this point on a few nodes
>>> (and it still had dozens of sstables to stream.)
>>>
>>> I followed this:
>>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>>
>>> Is there something missing in that documentation?
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>>
>>
>
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by aaron morton <aa...@thelastpickle.com>.

> We're going to try running a shuffle before adding a new node again... maybe that will help
I don't think  hurt but I doubt it will help. 


>> It seems when new nodes join, they are streamed *all* sstables in the cluster.

> 

How many nodes did you join, what was the num_tokens ? 
Did you notice streaming from all nodes (in the logs) or are you saying this in response to the cluster load increasing ? 

>> The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.)
Which were the new nodes ?
Can you show the output from nodetool status?


Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/04/2013, at 9:35 AM, Bryan Talbot <bt...@aeriagames.com> wrote:

> I believe that "nodetool rebuild" is used to add a new datacenter, not just a new host to an existing cluster.  Is that what you ran to add the node?
> 
> -Bryan
> 
> 
> 
> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:
> Small relief we're not the only ones that had this issue.
> 
> We're going to try running a shuffle before adding a new node again... maybe that will help
> 
> - John
> 
> 
> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <fs...@igcorp.com.br> wrote:
> I am using the same version and observed something similar.
> 
> I've added a new node, but the instructions from Datastax did not work for me. Then I ran "nodetool rebuild" on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran "nodetool cleanup" on the older nodes, the situation was the same.
> 
> The problem only seemed to disappear when "nodetool repair" was applied to all nodes.
> 
> Regards,
> Francisco Sobral.
> 
> 
> 
> 
> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
> 
>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not?
>> 
>> It seems when new nodes join, they are streamed *all* sstables in the cluster.
>> 
>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>> 
>> The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet).
>> The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.)
>> 
>> I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>> 
>> Is there something missing in that documentation?
>> 
>> Thanks,
>> 
>> John
> 
> 
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by Bryan Talbot <bt...@aeriagames.com>.

I believe that "nodetool rebuild" is used to add a new datacenter, not just
a new host to an existing cluster.  Is that what you ran to add the node?

-Bryan



On Fri, Apr 26, 2013 at 1:27 PM, John Watson <jo...@disqus.com> wrote:

> Small relief we're not the only ones that had this issue.
>
> We're going to try running a shuffle before adding a new node again...
> maybe that will help
>
> - John
>
>
> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <
> fsobral@igcorp.com.br> wrote:
>
>> I am using the same version and observed something similar.
>>
>> I've added a new node, but the instructions from Datastax did not work
>> for me. Then I ran "nodetool rebuild" on the new node. After finished this
>> command, it contained two times the load of the other nodes. Even when I
>> ran "nodetool cleanup" on the older nodes, the situation was the same.
>>
>> The problem only seemed to disappear when "nodetool repair" was applied
>> to all nodes.
>>
>> Regards,
>> Francisco Sobral.
>>
>>
>>
>>
>> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>>
>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running
>> upgradesstables, I figured it would be safe to start adding nodes to the
>> cluster. Guess not?
>>
>> It seems when new nodes join, they are streamed *all* sstables in the
>> cluster.
>>
>>
>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>
>> The gray the line machine ran out disk space and for some reason cascaded
>> into errors in the cluster about 'no host id' when trying to store hints
>> for it (even though it hadn't joined yet).
>> The purple line machine, I just stopped the joining process because the
>> main cluster was dropping mutation messages at this point on a few nodes
>> (and it still had dozens of sstables to stream.)
>>
>> I followed this:
>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>
>> Is there something missing in that documentation?
>>
>> Thanks,
>>
>> John
>>
>>
>>
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by John Watson <jo...@disqus.com>.

Small relief we're not the only ones that had this issue.

We're going to try running a shuffle before adding a new node again...
maybe that will help

- John


On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <
fsobral@igcorp.com.br> wrote:

> I am using the same version and observed something similar.
>
> I've added a new node, but the instructions from Datastax did not work for
> me. Then I ran "nodetool rebuild" on the new node. After finished this
> command, it contained two times the load of the other nodes. Even when I
> ran "nodetool cleanup" on the older nodes, the situation was the same.
>
> The problem only seemed to disappear when "nodetool repair" was applied to
> all nodes.
>
> Regards,
> Francisco Sobral.
>
>
>
>
> On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:
>
> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running
> upgradesstables, I figured it would be safe to start adding nodes to the
> cluster. Guess not?
>
> It seems when new nodes join, they are streamed *all* sstables in the
> cluster.
>
>
> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>
> The gray the line machine ran out disk space and for some reason cascaded
> into errors in the cluster about 'no host id' when trying to store hints
> for it (even though it hadn't joined yet).
> The purple line machine, I just stopped the joining process because the
> main cluster was dropping mutation messages at this point on a few nodes
> (and it still had dozens of sstables to stream.)
>
> I followed this:
> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>
> Is there something missing in that documentation?
>
> Thanks,
>
> John
>
>
>

Re: Adding nodes in 1.2 with vnodes requires huge disks

Posted by Francisco Nogueira Calmon Sobral <fs...@igcorp.com.br>.

I am using the same version and observed something similar.

I've added a new node, but the instructions from Datastax did not work for me. Then I ran "nodetool rebuild" on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran "nodetool cleanup" on the older nodes, the situation was the same.

The problem only seemed to disappear when "nodetool repair" was applied to all nodes.

Regards,
Francisco Sobral.

On Apr 25, 2013, at 4:57 PM, John Watson <jo...@disqus.com> wrote:

> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not?
> 
> It seems when new nodes join, they are streamed *all* sstables in the cluster.
> 
> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
> 
> The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet).
> The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.)
> 
> I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes
> 
> Is there something missing in that documentation?
> 
> Thanks,
> 
> John