You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Strauss (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/01/13 02:53:39 UTC

[jira] [Issue Comment Edited] (CASSANDRA-3736) -Dreplace_token leaves old node (IP) in the gossip with the token.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185385#comment-13185385 ] 

David Strauss edited comment on CASSANDRA-3736 at 1/13/12 1:52 AM:
-------------------------------------------------------------------

Hi Vijay,

I filed the ticket with DataStax that prompted this issue. I'm not 100% certain whether the node we replaced was fully and consistently offline from the point we performed the replacement. I *believe* it was, especially because the -Dreplace_token refuses to work if the node being replaced is online -- and we took no further action to bring the replaced node back (its VM wasn't initializing any network interfaces other than "lo").

Even if the replaced node comes back, it shouldn't be allowed to re-join the ring with a token already owned by an "Up" node. It should be subjected to the same condition -Dreplace_token is, where the token being used by the new ring member must be owned by a "Down" node.

David
                
      was (Author: davidstrauss):
    Hi Vijay,

I filed the ticket with DataStax that prompted this issue. I'm not 100% certain whether the node we replaced was fully and consistently offline from the point we performed the replacement. I *believe* it was, especially because the -Dreplace_token refuses to work if the node being replaced is online and we took no further action to bring the replaced node back (its VM wasn't initializing any network interfaces other than "lo").

Even if the replaced node comes back, it shouldn't be allowed to re-join the ring with a token already owned by an "Up" node. It should be subjected to the same condition -Dreplace_token is, where the token being used by the new ring member must be owned by a "Down" node.

David
                  
> -Dreplace_token leaves old node (IP) in the gossip with the token.
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-3736
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3736
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jackson Chung
>            Assignee: Vijay
>             Fix For: 1.0.7
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-957 introduce a -Dreplace_token,
> however, the replaced IP keeps on showing up in the Gossiper when starting the replacement node:
> {noformat}
>  INFO [Thread-2] 2012-01-12 23:59:35,162 CassandraDaemon.java (line 213) Listening for thrift clients...
>  INFO [GossipStage:1] 2012-01-12 23:59:35,173 Gossiper.java (line 836) Node /50.56.59.68 has restarted, now UP
>  INFO [GossipStage:1] 2012-01-12 23:59:35,174 Gossiper.java (line 804) InetAddress /50.56.59.68 is now UP
>  INFO [GossipStage:1] 2012-01-12 23:59:35,175 StorageService.java (line 988) Node /50.56.59.68 state jump to normal
>  INFO [GossipStage:1] 2012-01-12 23:59:35,176 Gossiper.java (line 836) Node /50.56.58.55 has restarted, now UP
>  INFO [GossipStage:1] 2012-01-12 23:59:35,176 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP
>  INFO [GossipStage:1] 2012-01-12 23:59:35,177 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864.  Ignoring /50.56.58.55
>  INFO [GossipTasks:1] 2012-01-12 23:59:45,048 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead.
>  INFO [GossipTasks:1] 2012-01-13 00:00:06,062 Gossiper.java (line 632) FatClient /50.56.58.55 has been silent for 30000ms, removing from gossip
>  INFO [GossipStage:1] 2012-01-13 00:01:06,320 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster
>  INFO [GossipStage:1] 2012-01-13 00:01:06,320 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP
>  INFO [GossipStage:1] 2012-01-13 00:01:06,321 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864.  Ignoring /50.56.58.55
>  INFO [GossipTasks:1] 2012-01-13 00:01:16,106 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead.
>  INFO [GossipTasks:1] 2012-01-13 00:01:37,121 Gossiper.java (line 632) FatClient /50.56.58.55 has been silent for 30000ms, removing from gossip
>  INFO [GossipStage:1] 2012-01-13 00:02:37,352 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster
>  INFO [GossipStage:1] 2012-01-13 00:02:37,353 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP
>  INFO [GossipStage:1] 2012-01-13 00:02:37,353 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864.  Ignoring /50.56.58.55
>  INFO [GossipTasks:1] 2012-01-13 00:02:47,158 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead.
>  INFO [GossipStage:1] 2012-01-13 00:02:50,162 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead.
>  INFO [GossipStage:1] 2012-01-13 00:02:50,163 StorageService.java (line 1156) Removing token 122029383590318827259508597176866581733 for /50.56.58.55
> {noformat}
> in the above, /50.56.58.55 was the replaced IP.
> tried adding the "Gossiper.instance.removeEndpoint(endpoint);" in the StorageService.java where the message 'Nodes %s and %s have the same token %s.  Ignoring %s",' seems only have fixed this temporary. Here is a ring output:
> {noformat}
> riptano@action-quick:~/work/cassandra$ ./bin/nodetool -h localhost ring
> Address         DC          Rack        Status State   Load            Owns    Token                                       
>                                                                                85070591730234615865843651857942052864      
> 50.56.59.68     datacenter1 rack1       Up     Normal  6.67 KB         85.56%  60502102442797279294142560823234402248      
> 50.56.31.186    datacenter1 rack1       Up     Normal  11.12 KB        14.44%  85070591730234615865843651857942052864 
> {noformat}
> gossipinfo:
> {noformat}
> $ ./bin/nodetool -h localhost gossipinfo
> /50.56.58.55
>   LOAD:6835.0
>   SCHEMA:00000000-0000-1000-0000-000000000000
>   RPC_ADDRESS:50.56.58.55
>   STATUS:NORMAL,85070591730234615865843651857942052864
>   RELEASE_VERSION:1.0.7-SNAPSHOT
> /50.56.59.68
>   LOAD:6835.0
>   SCHEMA:00000000-0000-1000-0000-000000000000
>   RPC_ADDRESS:50.56.59.68
>   STATUS:NORMAL,60502102442797279294142560823234402248
>   RELEASE_VERSION:1.0.7-SNAPSHOT
> action-quick2/50.56.31.186
>   LOAD:11387.0
>   SCHEMA:00000000-0000-1000-0000-000000000000
>   RPC_ADDRESS:50.56.31.186
>   STATUS:NORMAL,85070591730234615865843651857942052864
>   RELEASE_VERSION:1.0.7-SNAPSHOT
> {noformat}
> Note that at 1 point earlier it seems to have been removed:
> $ ./bin/nodetool -h localhost gossipinfo
> /50.56.59.68
>   LOAD:13815.0
>   SCHEMA:00000000-0000-1000-0000-000000000000
>   RPC_ADDRESS:50.56.59.68
>   STATUS:NORMAL,60502102442797279294142560823234402248
>   RELEASE_VERSION:1.0.7-SNAPSHOT
> action-quick2/50.56.31.186
>   LOAD:13725.0
>   SCHEMA:00000000-0000-1000-0000-000000000000
>   RPC_ADDRESS:50.56.31.186
>   STATUS:NORMAL,85070591730234615865843651857942052864
>   RELEASE_VERSION:1.0.7-SNAPSHOT
> riptano@action-quick2:~/work/cassandra$  INFO [GossipStage:1] 2012-01-13 01:03:30,073 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster
>  INFO [GossipStage:1] 2012-01-13 01:03:30,073 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP
>  INFO [GossipStage:1] 2012-01-13 01:03:30,074 StorageService.java (line 1017) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864.  Ignoring /50.56.58.55

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira