You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thomas van Neerijnen <to...@bossastudios.com> on 2012/10/18 15:44:20 UTC

replaced node keeps returning in gossip

Hi all

I'm running Cassandra 1.0.11 on Ubuntu 11.10.

I've got a ghost node which keeps showing up on my ring.

A node living on IP 10.16.128.210 and token 0 died and had to be replaced.
I replaced it with a new node, IP 10.16.128.197 and again token 0 with a
"-Dcassandra.replace_token=0" at startup. This all went well but now I'm
seeing the following weirdness constantly reported in the log files around
the ring:

 INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632)
FatClient /10.16.128.210 has been silent for 30000ms, removing from gossip
 INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838) Node
/10.16.128.210 is now part of the cluster
 INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804)
InetAddress /10.16.128.210 is now UP
 INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line
1017) Nodes /10.16.128.210 and /10.16.128.197 have the same token 0.
Ignoring /10.16.128.210
 INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818)
InetAddress /10.16.128.210 is now dead.
 INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632)
FatClient /10.16.128.210 has been silent for 30000ms, removing from gossip

Re: replaced node keeps returning in gossip

Posted by Thomas van Neerijnen <to...@bossastudios.com>.
Hi

When I sent the mail I'd had the new node on for about an hour, the old
node died about an hour before that.
The weirdness in the log files stopped yesterday afternoon, about 4 or 5
hours after I replaced the node so it seems to have resolved itself.
Seeing as there's no error to look at in the log files not sure if you
still want the output of my gossipinfo but I've pasted it below anyway.
Thanks!

/10.16.96.212
  LOAD:7.8018521345E10
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:1.0.11
  STATUS:NORMAL,113427455640312821154458202477256070484
  SCHEMA:9b152e00-fd90-11e1-0000-2d22988ca597
/10.16.128.211
  LOAD:7.8416250275E10
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:1.0.11
  STATUS:NORMAL,85070591730234615865843651857942052863
  SCHEMA:9b152e00-fd90-11e1-0000-2d22988ca597
/10.16.32.210
  LOAD:1.29054735121E11
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:1.0.11
  STATUS:NORMAL,56713727820156407428984779325531226112
  SCHEMA:9b152e00-fd90-11e1-0000-2d22988ca597
/10.16.32.211
  LOAD:7.2937725831E10
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:1.0.11
  STATUS:NORMAL,141784319550391032739561396922763706368
  SCHEMA:9b152e00-fd90-11e1-0000-2d22988ca597
ip-10-16-128-197.localdomain/10.16.128.197
  LOAD:6.5571879526E10
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:1.0.11
  STATUS:NORMAL,0
  SCHEMA:9b152e00-fd90-11e1-0000-2d22988ca597
/10.16.96.211
  LOAD:1.0633383453E11
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:1.0.11
  STATUS:NORMAL,28356863910078203714492389662765613056
  SCHEMA:9b152e00-fd90-11e1-0000-2d22988ca597


On Fri, Oct 19, 2012 at 2:56 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I replaced it with a new node, IP 10.16.128.197 and again token 0 with a
> "-Dcassandra.replace_token=0" at startup
>
> Good Good.
>
> How long ago did you bring the new node on ? There is a fail safe to
> remove 128.210 after 3 days if it does not gossip to other nodes.
>
> I *thought* that remove_token would remove the old IP from the ring. Can
> you post the output from nodetool gossipinfo from the 128.197 node ?
>
> Thanks
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/10/2012, at 2:44 AM, Thomas van Neerijnen <to...@bossastudios.com>
> wrote:
>
> Hi all
>
> I'm running Cassandra 1.0.11 on Ubuntu 11.10.
>
> I've got a ghost node which keeps showing up on my ring.
>
> A node living on IP 10.16.128.210 and token 0 died and had to be replaced.
> I replaced it with a new node, IP 10.16.128.197 and again token 0 with a
> "-Dcassandra.replace_token=0" at startup. This all went well but now I'm
> seeing the following weirdness constantly reported in the log files around
> the ring:
>
>  INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632)
> FatClient /10.16.128.210 has been silent for 30000ms, removing from gossip
>  INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838)
> Node /10.16.128.210 is now part of the cluster
>  INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804)
> InetAddress /10.16.128.210 is now UP
>  INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line
> 1017) Nodes /10.16.128.210 and /10.16.128.197 have the same token 0.
> Ignoring /10.16.128.210
>  INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818)
> InetAddress /10.16.128.210 is now dead.
>  INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632)
> FatClient /10.16.128.210 has been silent for 30000ms, removing from gossip
>
>
>

Re: replaced node keeps returning in gossip

Posted by aaron morton <aa...@thelastpickle.com>.
> I replaced it with a new node, IP 10.16.128.197 and again token 0 with a "-Dcassandra.replace_token=0" at startup
Good Good. 

How long ago did you bring the new node on ? There is a fail safe to remove 128.210 after 3 days if it does not gossip to other nodes. 

I *thought* that remove_token would remove the old IP from the ring. Can you post the output from nodetool gossipinfo from the 128.197 node ?

Thanks
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/10/2012, at 2:44 AM, Thomas van Neerijnen <to...@bossastudios.com> wrote:

> Hi all
> 
> I'm running Cassandra 1.0.11 on Ubuntu 11.10.
> 
> I've got a ghost node which keeps showing up on my ring.
> 
> A node living on IP 10.16.128.210 and token 0 died and had to be replaced.
> I replaced it with a new node, IP 10.16.128.197 and again token 0 with a "-Dcassandra.replace_token=0" at startup. This all went well but now I'm seeing the following weirdness constantly reported in the log files around the ring:
> 
>  INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632) FatClient /10.16.128.210 has been silent for 30000ms, removing from gossip
>  INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838) Node /10.16.128.210 is now part of the cluster
>  INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804) InetAddress /10.16.128.210 is now UP
>  INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line 1017) Nodes /10.16.128.210 and /10.16.128.197 have the same token 0.  Ignoring /10.16.128.210
>  INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818) InetAddress /10.16.128.210 is now dead.
>  INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632) FatClient /10.16.128.210 has been silent for 30000ms, removing from gossip