You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Janne Jalkanen <Ja...@ecyrd.com> on 2013/08/25 10:06:31 UTC

Failed decommission

This on cass 1.2.8

Ring state before decommission

--  Address         Load       Owns   Host ID                               Token                                    Rack
UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1                                        1a
UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69  56713727820156407428984779325531226110   1b
UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d  56713727820156407428984779325531226111   1b
UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7  113427455640312814857969558651062452222  1c

Trying to decommission the node

ubuntu@10.0.0.3:~$ nodetool decommission
Exception in thread "main" java.lang.NumberFormatException: For input string: "56713727820156407428984779325531226111"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:444)
	at java.lang.Long.parseLong(Long.java:483)
	at org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
	at org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
	at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
	at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
	at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
	at org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
	at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
	at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)

Now I'm in a state where the machine is still "up" but "leaving" but I can't seem to get it out of the ring.  For example:

% nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
Exception in thread "main" java.lang.UnsupportedOperationException: Node /10.0.0.3 is alive and owns this ID. Use decommission command to remove it from the ring

Any ideas?

/Janne

Re: Failed decommission

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
Thanks; this worked for me too.

/Janne

On Aug 25, 2013, at 18:47 , Mike Heffner <mi...@librato.com> wrote:

> Janne,
> 
> We ran into this too. Appears it's a bug in 1.2.8 that is fixed in the upcoming 1.2.9. I added the steps I took to finally remove the node here: https://issues.apache.org/jira/browse/CASSANDRA-5857?focusedCommentId=13748998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13748998
> 
> 
> Cheers,
> 
> Mike
> 
> 
> On Sun, Aug 25, 2013 at 4:06 AM, Janne Jalkanen <Ja...@ecyrd.com> wrote:
> This on cass 1.2.8
> 
> Ring state before decommission
> 
> --  Address         Load       Owns   Host ID                               Token                                    Rack
> UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1                                        1a
> UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69  56713727820156407428984779325531226110   1b
> UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d  56713727820156407428984779325531226111   1b
> UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7  113427455640312814857969558651062452222  1c
> 
> Trying to decommission the node
> 
> ubuntu@10.0.0.3:~$ nodetool decommission
> Exception in thread "main" java.lang.NumberFormatException: For input string: "56713727820156407428984779325531226111"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:444)
>         at java.lang.Long.parseLong(Long.java:483)
>         at org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>         at org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
>         at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
>         at org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
>         at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
>         at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
> 
> Now I'm in a state where the machine is still "up" but "leaving" but I can't seem to get it out of the ring.  For example:
> 
> % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
> Exception in thread "main" java.lang.UnsupportedOperationException: Node /10.0.0.3 is alive and owns this ID. Use decommission command to remove it from the ring
> 
> Any ideas?
> 
> /Janne
> 
> 
> 
> -- 
> 
>   Mike Heffner <mi...@librato.com>
>   Librato, Inc.
> 


Re: Failed decommission

Posted by Jon Haddad <jo...@jonhaddad.com>.
We ran into a similar issue as well.  I believe we removed the node via cqlsh from the system keyspace, restarted the cluster, then ran a repair.  I'm not sure how safe this really is though.


On Aug 25, 2013, at 8:47 AM, Mike Heffner <mi...@librato.com> wrote:

> Janne,
> 
> We ran into this too. Appears it's a bug in 1.2.8 that is fixed in the upcoming 1.2.9. I added the steps I took to finally remove the node here: https://issues.apache.org/jira/browse/CASSANDRA-5857?focusedCommentId=13748998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13748998
> 
> 
> Cheers,
> 
> Mike
> 
> 
> On Sun, Aug 25, 2013 at 4:06 AM, Janne Jalkanen <Ja...@ecyrd.com> wrote:
> This on cass 1.2.8
> 
> Ring state before decommission
> 
> --  Address         Load       Owns   Host ID                               Token                                    Rack
> UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1                                        1a
> UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69  56713727820156407428984779325531226110   1b
> UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d  56713727820156407428984779325531226111   1b
> UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7  113427455640312814857969558651062452222  1c
> 
> Trying to decommission the node
> 
> ubuntu@10.0.0.3:~$ nodetool decommission
> Exception in thread "main" java.lang.NumberFormatException: For input string: "56713727820156407428984779325531226111"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:444)
>         at java.lang.Long.parseLong(Long.java:483)
>         at org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>         at org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
>         at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
>         at org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
>         at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
>         at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
> 
> Now I'm in a state where the machine is still "up" but "leaving" but I can't seem to get it out of the ring.  For example:
> 
> % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
> Exception in thread "main" java.lang.UnsupportedOperationException: Node /10.0.0.3 is alive and owns this ID. Use decommission command to remove it from the ring
> 
> Any ideas?
> 
> /Janne
> 
> 
> 
> -- 
> 
>   Mike Heffner <mi...@librato.com>
>   Librato, Inc.
> 


Re: Failed decommission

Posted by Mike Heffner <mi...@librato.com>.
Janne,

We ran into this too. Appears it's a bug in 1.2.8 that is fixed in the
upcoming 1.2.9. I added the steps I took to finally remove the node here:
https://issues.apache.org/jira/browse/CASSANDRA-5857?focusedCommentId=13748998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13748998


Cheers,

Mike


On Sun, Aug 25, 2013 at 4:06 AM, Janne Jalkanen <Ja...@ecyrd.com>wrote:

> This on cass 1.2.8
>
> Ring state before decommission
>
> --  Address         Load       Owns   Host ID
>   Token                                    Rack
> UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1
>                                      1a
> UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69
>  56713727820156407428984779325531226110   1b
> UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d
>  56713727820156407428984779325531226111   1b
> UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7
>  113427455640312814857969558651062452222  1c
>
> Trying to decommission the node
>
> ubuntu@10.0.0.3:~$ nodetool decommission
> Exception in thread "main" java.lang.NumberFormatException: For input
> string: "56713727820156407428984779325531226111"
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:444)
>         at java.lang.Long.parseLong(Long.java:483)
>         at
> org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>         at
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
>         at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
>         at
> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
>         at
> org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
>         at
> org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
>         at
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
>         at
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
>
> Now I'm in a state where the machine is still "up" but "leaving" but I
> can't seem to get it out of the ring.  For example:
>
> % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
> Exception in thread "main" java.lang.UnsupportedOperationException: Node /
> 10.0.0.3 is alive and owns this ID. Use decommission command to remove it
> from the ring
>
> Any ideas?
>
> /Janne




-- 

  Mike Heffner <mi...@librato.com>
  Librato, Inc.

Re: Failed decommission

Posted by Nate McCall <na...@thelastpickle.com>.
This is what I was seeing code wise as well - but Mike's answer was spot
on. Glad you got this straightened out. (And huge thanks to Mike for coming
back to post a work-around here and on the ticket).


On Sun, Aug 25, 2013 at 11:42 AM, Janne Jalkanen
<Ja...@ecyrd.com>wrote:

>
> This would be RP (cluster upgraded from 0.6->0.8->1.0->1.1 ;-). Looks to
> me like decommission assumes Murmur and 64-bit tokens.
>
> /Janne
>
> On Aug 25, 2013, at 17:25 , Nate McCall <na...@thelastpickle.com> wrote:
>
> Are you using Murmur3 or the older Random partitioner on this cluster?
>
>
> On Sun, Aug 25, 2013 at 3:06 AM, Janne Jalkanen <Ja...@ecyrd.com>wrote:
>
>> This on cass 1.2.8
>>
>> Ring state before decommission
>>
>> --  Address         Load       Owns   Host ID
>>   Token                                    Rack
>> UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1
>>                                      1a
>> UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69
>>  56713727820156407428984779325531226110   1b
>> UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d
>>  56713727820156407428984779325531226111   1b
>> UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7
>>  113427455640312814857969558651062452222  1c
>>
>> Trying to decommission the node
>>
>> ubuntu@10.0.0.3:~$ nodetool decommission
>> Exception in thread "main" java.lang.NumberFormatException: For input
>> string: "56713727820156407428984779325531226111"
>>         at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>>         at java.lang.Long.parseLong(Long.java:444)
>>         at java.lang.Long.parseLong(Long.java:483)
>>         at
>> org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>>         at
>> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
>>         at
>> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
>>         at
>> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
>>         at
>> org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
>>         at
>> org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
>>         at
>> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
>>         at
>> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
>>
>> Now I'm in a state where the machine is still "up" but "leaving" but I
>> can't seem to get it out of the ring.  For example:
>>
>> % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
>> Exception in thread "main" java.lang.UnsupportedOperationException: Node /
>> 10.0.0.3 is alive and owns this ID. Use decommission command to remove
>> it from the ring
>>
>> Any ideas?
>>
>> /Janne
>
>
>
>

Re: Failed decommission

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
This would be RP (cluster upgraded from 0.6->0.8->1.0->1.1 ;-). Looks to me like decommission assumes Murmur and 64-bit tokens.

/Janne

On Aug 25, 2013, at 17:25 , Nate McCall <na...@thelastpickle.com> wrote:

> Are you using Murmur3 or the older Random partitioner on this cluster?
> 
> 
> On Sun, Aug 25, 2013 at 3:06 AM, Janne Jalkanen <Ja...@ecyrd.com> wrote:
> This on cass 1.2.8
> 
> Ring state before decommission
> 
> --  Address         Load       Owns   Host ID                               Token                                    Rack
> UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1                                        1a
> UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69  56713727820156407428984779325531226110   1b
> UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d  56713727820156407428984779325531226111   1b
> UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7  113427455640312814857969558651062452222  1c
> 
> Trying to decommission the node
> 
> ubuntu@10.0.0.3:~$ nodetool decommission
> Exception in thread "main" java.lang.NumberFormatException: For input string: "56713727820156407428984779325531226111"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:444)
>         at java.lang.Long.parseLong(Long.java:483)
>         at org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>         at org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
>         at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
>         at org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
>         at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
>         at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
> 
> Now I'm in a state where the machine is still "up" but "leaving" but I can't seem to get it out of the ring.  For example:
> 
> % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
> Exception in thread "main" java.lang.UnsupportedOperationException: Node /10.0.0.3 is alive and owns this ID. Use decommission command to remove it from the ring
> 
> Any ideas?
> 
> /Janne
> 


Re: Failed decommission

Posted by Nate McCall <na...@thelastpickle.com>.
Are you using Murmur3 or the older Random partitioner on this cluster?


On Sun, Aug 25, 2013 at 3:06 AM, Janne Jalkanen <Ja...@ecyrd.com>wrote:

> This on cass 1.2.8
>
> Ring state before decommission
>
> --  Address         Load       Owns   Host ID
>   Token                                    Rack
> UN  10.0.0.1  38.82 GB   33.3%  21a98502-dc74-4ad0-9689-0880aa110409  1
>                                      1a
> UN  10.0.0.2   33.5 GB    33.3%  cba6b27a-4982-4f04-854d-cc73155d5f69
>  56713727820156407428984779325531226110   1b
> UN  10.0.0.3  37.41 GB   0.0%   6ba2c7d4-713e-4c14-8df8-f861fb211b0d
>  56713727820156407428984779325531226111   1b
> UN  10.0.0.4  35.7 GB    33.3%  bf3d4792-f3e0-4062-afe3-be292bc85ed7
>  113427455640312814857969558651062452222  1c
>
> Trying to decommission the node
>
> ubuntu@10.0.0.3:~$ nodetool decommission
> Exception in thread "main" java.lang.NumberFormatException: For input
> string: "56713727820156407428984779325531226111"
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:444)
>         at java.lang.Long.parseLong(Long.java:483)
>         at
> org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>         at
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
>         at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
>         at
> org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:949)
>         at
> org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1116)
>         at
> org.apache.cassandra.service.StorageService.leaveRing(StorageService.java:2817)
>         at
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2861)
>         at
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2808)
>
> Now I'm in a state where the machine is still "up" but "leaving" but I
> can't seem to get it out of the ring.  For example:
>
> % nodetool removenode 6ba2c7d4-713e-4c14-8df8-f861fb211b0d
> Exception in thread "main" java.lang.UnsupportedOperationException: Node /
> 10.0.0.3 is alive and owns this ID. Use decommission command to remove it
> from the ring
>
> Any ideas?
>
> /Janne