You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pradeep Chhetri <pr...@stashaway.com> on 2017/12/17 11:19:54 UTC

Problem adding a new node to a cluster

Hello all,

I am trying to add a 4th node to a 3-node cluster which is using
SimpleSnitch. But this new node is stuck in Joining state for last 20
hours. We have around 10GB data per node with RF as 3.

Its mostly stuck in redistributing index summaries phase.

Here are the logs:

https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d

# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID
                       Rack
UJ  10.42.187.43   9.73 GiB   256          ?
 36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
UN  10.42.106.184  9.95 GiB   256          100.0%
42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
UN  10.42.169.195  10.35 GiB  256          100.0%
9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
UN  10.42.209.245  8.54 GiB   256          100.0%
9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1

Not sure what is going here, will be very helpful if someone can help in
identifying the issue.

Thank you.

Re: Problem adding a new node to a cluster

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Definitely upgrade to 3.11.1.
On Sun, Dec 17, 2017 at 8:54 PM Pradeep Chhetri <pr...@stashaway.com>
wrote:

> Hello Kurt,
>
> I realized it was because of RAM shortage which caused the issue. I bumped
> up the memory of the machine and node bootstrap started but this time i hit
> this bug of cassandra 3.9:
>
> https://issues.apache.org/jira/browse/CASSANDRA-12905
>
> I tried running nodetool bootstrap resume multiple times but every time it
> fails with exception after completing around 963%
>
> https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4
>
> Do you think there is some workaround for this. Or do you suggest
> upgrading to v3.11 which has this fix.
>
> Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling
> fashion or do we need to take care of something in case we have to upgrade.
>
> Thanks.
>
>
>
>
>
>
> On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves <ku...@instaclustr.com>
> wrote:
>
>> You haven't provided enough logs for us to really tell what's wrong. I
>> suggest running *nodetool netstats* *| grep -v 100% *to see if any
>> streams are still ongoing, and also running *nodetool compactionstats -H* to
>> see if there are any index builds the node might be waiting for prior to
>> joining the ring.
>>
>> If neither of those provide any useful information, send us the full
>> system.log and debug.log
>>
>> On 17 December 2017 at 11:19, Pradeep Chhetri <pr...@stashaway.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> I am trying to add a 4th node to a 3-node cluster which is using
>>> SimpleSnitch. But this new node is stuck in Joining state for last 20
>>> hours. We have around 10GB data per node with RF as 3.
>>>
>>> Its mostly stuck in redistributing index summaries phase.
>>>
>>> Here are the logs:
>>>
>>> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>>>
>>> # nodetool status
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address        Load       Tokens       Owns (effective)  Host ID
>>>                            Rack
>>> UJ  10.42.187.43   9.73 GiB   256          ?
>>>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
>>> UN  10.42.106.184  9.95 GiB   256          100.0%
>>> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
>>> UN  10.42.169.195  10.35 GiB  256          100.0%
>>> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
>>> UN  10.42.209.245  8.54 GiB   256          100.0%
>>> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>>>
>>> Not sure what is going here, will be very helpful if someone can help in
>>> identifying the issue.
>>>
>>> Thank you.
>>>
>>>
>>>
>>
>

Re: Problem adding a new node to a cluster

Posted by Pradeep Chhetri <pr...@stashaway.com>.
Hello Kurt,

I realized it was because of RAM shortage which caused the issue. I bumped
up the memory of the machine and node bootstrap started but this time i hit
this bug of cassandra 3.9:

https://issues.apache.org/jira/browse/CASSANDRA-12905

I tried running nodetool bootstrap resume multiple times but every time it
fails with exception after completing around 963%

https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4

Do you think there is some workaround for this. Or do you suggest upgrading
to v3.11 which has this fix.

Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling fashion
or do we need to take care of something in case we have to upgrade.

Thanks.






On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves <ku...@instaclustr.com> wrote:

> You haven't provided enough logs for us to really tell what's wrong. I
> suggest running *nodetool netstats* *| grep -v 100% *to see if any
> streams are still ongoing, and also running *nodetool compactionstats -H* to
> see if there are any index builds the node might be waiting for prior to
> joining the ring.
>
> If neither of those provide any useful information, send us the full
> system.log and debug.log
>
> On 17 December 2017 at 11:19, Pradeep Chhetri <pr...@stashaway.com>
> wrote:
>
>> Hello all,
>>
>> I am trying to add a 4th node to a 3-node cluster which is using
>> SimpleSnitch. But this new node is stuck in Joining state for last 20
>> hours. We have around 10GB data per node with RF as 3.
>>
>> Its mostly stuck in redistributing index summaries phase.
>>
>> Here are the logs:
>>
>> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>>
>> # nodetool status
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address        Load       Tokens       Owns (effective)  Host ID
>>                          Rack
>> UJ  10.42.187.43   9.73 GiB   256          ?
>>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
>> UN  10.42.106.184  9.95 GiB   256          100.0%
>> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
>> UN  10.42.169.195  10.35 GiB  256          100.0%
>> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
>> UN  10.42.209.245  8.54 GiB   256          100.0%
>> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>>
>> Not sure what is going here, will be very helpful if someone can help in
>> identifying the issue.
>>
>> Thank you.
>>
>>
>>
>

Re: Problem adding a new node to a cluster

Posted by kurt greaves <ku...@instaclustr.com>.
You haven't provided enough logs for us to really tell what's wrong. I
suggest running *nodetool netstats* *| grep -v 100% *to see if any streams
are still ongoing, and also running *nodetool compactionstats -H* to see if
there are any index builds the node might be waiting for prior to joining
the ring.

If neither of those provide any useful information, send us the full
system.log and debug.log

On 17 December 2017 at 11:19, Pradeep Chhetri <pr...@stashaway.com> wrote:

> Hello all,
>
> I am trying to add a 4th node to a 3-node cluster which is using
> SimpleSnitch. But this new node is stuck in Joining state for last 20
> hours. We have around 10GB data per node with RF as 3.
>
> Its mostly stuck in redistributing index summaries phase.
>
> Here are the logs:
>
> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>
> # nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID
>                          Rack
> UJ  10.42.187.43   9.73 GiB   256          ?
>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
> UN  10.42.106.184  9.95 GiB   256          100.0%
> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
> UN  10.42.169.195  10.35 GiB  256          100.0%
> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
> UN  10.42.209.245  8.54 GiB   256          100.0%
> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>
> Not sure what is going here, will be very helpful if someone can help in
> identifying the issue.
>
> Thank you.
>
>
>