You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by David McNelis <dm...@gmail.com> on 2013/06/18 04:59:17 UTC

Node failing to decomission (vnodes and 1.2.5)

I have a node in my ring (1.2.5) that when it was set up, had the wrong
number of vnodes assigned (double the amount it should have had).

As  a result, and because we can't reduce the number of vnodes on a machine
(at least at this point), I need to decommission the node.

The problem is that we've tried running decommission several times.  In
each instance we'll have a lot of streams to other nodes for a period, and
then eventually, netstats will tell us:

nodetool -h localhost netstats
Mode: LEAVING
 Nothing streaming to /10.x.x.1
 Nothing streaming to /10.x.x.2
 Nothing streaming to /10.x.x.3
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         955991
Responses                       n/a         0        2947860

I also am not seeing anything in the nodes log files to suggest errors
during streaming or leaving.

Then the node will stay in this leaving state for... well, we gave up after
several days of no more activity and retried several times.  Each time we
"gave up" on it, we restarted the service and it was no longer listed as
Leaving, just active.  Even when in a "leaving" state, the size of data on
the node continued to grow.

What suggestions does anyone have on getting this node removed from my ring
so I can rebuild it with the correct number of tokens, before I end up with
a disk space issue from too many vnodes.

Re: Node failing to decomission (vnodes and 1.2.5)

Posted by David McNelis <dm...@gmail.com>.

Never saw "decommissioned" in the logs, status continues to says "UL" on
status.

Removenode sounds like its likely to get the job done for us at this point.

Thanks.

David


On Tue, Jun 18, 2013 at 3:10 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I also am not seeing anything in the nodes log files to suggest errors
> during streaming or leaving.
>
> You should see a log message saying "DECOMMISSIONED" when the process
> completes.
>
> What does nodetool status say?
>
> What suggestions does anyone have on getting this node removed from my
> ring so I can rebuild it with the correct number of tokens, before I end up
> with a disk space issue from too many vnodes.
>
> If you really want to get the node out of there shut it down and run
> nodetool removenode on one of the remaining nodes.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/06/2013, at 2:59 PM, David McNelis <dm...@gmail.com> wrote:
>
> I have a node in my ring (1.2.5) that when it was set up, had the wrong
> number of vnodes assigned (double the amount it should have had).
>
> As  a result, and because we can't reduce the number of vnodes on a
> machine (at least at this point), I need to decommission the node.
>
> The problem is that we've tried running decommission several times.  In
> each instance we'll have a lot of streams to other nodes for a period, and
> then eventually, netstats will tell us:
>
> nodetool -h localhost netstats
> Mode: LEAVING
>  Nothing streaming to /10.x.x.1
>  Nothing streaming to /10.x.x.2
>  Nothing streaming to /10.x.x.3
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0         955991
> Responses                       n/a         0        2947860
>
> I also am not seeing anything in the nodes log files to suggest errors
> during streaming or leaving.
>
> Then the node will stay in this leaving state for... well, we gave up
> after several days of no more activity and retried several times.  Each
> time we "gave up" on it, we restarted the service and it was no longer
> listed as Leaving, just active.  Even when in a "leaving" state, the size
> of data on the node continued to grow.
>
> What suggestions does anyone have on getting this node removed from my
> ring so I can rebuild it with the correct number of tokens, before I end up
> with a disk space issue from too many vnodes.
>
>
>

Re: Node failing to decomission (vnodes and 1.2.5)

Posted by aaron morton <aa...@thelastpickle.com>.

> I also am not seeing anything in the nodes log files to suggest errors during streaming or leaving.
You should see a log message saying "DECOMMISSIONED" when the process completes. 

What does nodetool status say?

> What suggestions does anyone have on getting this node removed from my ring so I can rebuild it with the correct number of tokens, before I end up with a disk space issue from too many vnodes.
If you really want to get the node out of there shut it down and run nodetool removenode on one of the remaining nodes. 

Cheers
 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/06/2013, at 2:59 PM, David McNelis <dm...@gmail.com> wrote:

> I have a node in my ring (1.2.5) that when it was set up, had the wrong number of vnodes assigned (double the amount it should have had).
> 
> As  a result, and because we can't reduce the number of vnodes on a machine (at least at this point), I need to decommission the node.
> 
> The problem is that we've tried running decommission several times.  In each instance we'll have a lot of streams to other nodes for a period, and then eventually, netstats will tell us:
> 
> nodetool -h localhost netstats
> Mode: LEAVING
>  Nothing streaming to /10.x.x.1
>  Nothing streaming to /10.x.x.2
>  Nothing streaming to /10.x.x.3
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0         955991
> Responses                       n/a         0        2947860
> 
> I also am not seeing anything in the nodes log files to suggest errors during streaming or leaving.
> 
> Then the node will stay in this leaving state for... well, we gave up after several days of no more activity and retried several times.  Each time we "gave up" on it, we restarted the service and it was no longer listed as Leaving, just active.  Even when in a "leaving" state, the size of data on the node continued to grow.
> 
> What suggestions does anyone have on getting this node removed from my ring so I can rebuild it with the correct number of tokens, before I end up with a disk space issue from too many vnodes.