You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2015/05/11 18:27:01 UTC
[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node

    [ https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538117#comment-14538117 ] 

Ariel Weisberg commented on CASSANDRA-7292:
-------------------------------------------

[~brandon.williams] would this have been good retrospective fodder? We are obviously going to test bootstrapping nodes with the original IP just as part of generally manipulating the cluster. What would we have to test to capture stale state issues like this? How much of it is specific to EC2 configs and snitches and are they testable in non-EC2 configs (and is it common to say how we run on GCE?)

> Can't seed new node into ring with (public) ip of an old node
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-7292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7292
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch
>            Reporter: Juho Mäkinen
>            Assignee: Brandon Williams
>              Labels: bootstrap, gossip
>             Fix For: 2.0.15, 2.1.5
>
>         Attachments: 7292.txt, cassandra-replace-address.log
>
>
> This bug prevents node to return with bootstrap into the cluster with its old ip.
> Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as Ec2MultiRegionSnitch requires)
> I simulated a loss of one node by terminating one instance. nodetool status reported correctly that node was down. Then I launched new instance with the old public ip (i'm using elastic ips) with "Dcassandra.replace_address=IP_ADDRESS" but the new node can't join the cluster:
>  INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30
>  INFO 07:20:43,428 Starting Messaging Service on port 9043
>  INFO 07:20:43,489 Handshaking version with /54.86.171.10
>  INFO 07:20:43,491 Handshaking version with /54.86.187.245
> (some delay)
> ERROR 07:21:14,445 Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> 	at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193)
> 	at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419)
> 	at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650)
> 	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
> 	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:505)
> 	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362)
> 	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480)
> 	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569)
> It does not help if I remove the "Dcassandra.replace_address=IP_ADDRESS" system property. 
> Also it does not help to remove the node with "nodetool removenode" with or without the cassandra.replace_address property.
> I think this is because the node information is preserved in the gossip info as seen this output of "nodetool gossipinfo"
> /54.86.191.30
>   INTERNAL_IP:172.16.1.231
>   DC:us-east
>   REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8
>   STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664
>   HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d
>   RPC_ADDRESS:0.0.0.0
>   NET_VERSION:7
>   SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5
>   RACK:1b
>   LOAD:7.075290515E9
>   SEVERITY:0.0
>   RELEASE_VERSION:2.0.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)