You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Piavlo <lo...@gmail.com> on 2012/04/30 17:09:24 UTC

strange gossip messages after node reboot with different ip

  Hi,

We have a cassandra cluster in ec2.
If i stop a node and start it - as a result the node ip changes. The 
node is recognised as NEW node and is declared as replacing the previous 
node with same token.(But this is the same node of course)

In this specific case the node ip before stop/start was 10.63.14.214 and 
new ip is 10.54.81.14.
And even that the cluster and node  seems to be working fine for more 
than a day after the stop/start of this node, I see the following loop 
of messages ~ once every minute.

  INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) 
Node /10.63.14.214 is now part of the cluster
  INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 804) 
InetAddress /10.63.14.214 is now UP
  INFO [GossipStage:1] 2012-04-30 14:18:57,090 StorageService.java (line 
1017) Nodes /10.63.14.214 and cassa1a.internal/10.54.81.14 have the same 
token 0.  Ignoring /10.63.14.214
  INFO [GossipTasks:1] 2012-04-30 14:19:11,834 Gossiper.java (line 818) 
InetAddress /10.63.14.214 is now dead.
  INFO [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) 
FatClient /10.63.14.214 has been silent for 30000ms, removing from gossip
  INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java (line 838) 
Node /10.63.14.214 is now part of the cluster
...

How come the old ip 10.63.14.214 still popup as UP and then declared as 
DEAD again, an so on and on?
I know since this is ec2 other node with same ip can come UP, but i've 
verified and there is no such node and it certainly does not run 
cassandra :)
I stop/started another node and observe similar behaviour.
This is version 1.0.8

Another question, if node is recognised as new (due to ip change) but 
with same token - will other nodes stream the hinted handoffs to it?
And is there way to tell cassandra also use names and if ip changes but 
node name is the same and resolves to the new ip then the cluster treat 
it as old node?

Thanks
Alex

Re: strange gossip messages after node reboot with different ip

Posted by Piavlo <lo...@gmail.com>.

  Hi Aaron,

Below is the reposted gossipinfo on a fresh 6 node cluster for which I 
stop/started all nodes one by one ~12hours ago,

As you can see gossipinfo reports on 11 nodes, but what bothers me is 
why it reports STATUS:NORMAL for all of them
and decides that non existing node is UP just o announce it's dead a few 
seconds later?
Also the number of reported nodes in each gossipinfo invocation can 
differ - probably accordingly to nodes detected UP & DOWN all the time.
Besides this the cluster seems to be working properly , so I understand 
I can ignore these UPs & DOWNs - but it feels wrong
and I'm interested to understand what exactly makes the not existing 
nodes appear wrongly as UP again.

The nodes which are real have 
SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5 - others are unexisting nodes.

# nodetool -h localhost gossipinfo
/10.240.243.92
   STATUS:NORMAL,28356863910078205288614550619314017621
   DC:eu-west
   RACK:1b
   LOAD:7.8498918E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.55.93.28
   STATUS:NORMAL,85070591730234615865843651857942052864
   RACK:1a
   DC:eu-west
   LOAD:7.7833298E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.56.46.131
   STATUS:NORMAL,56713727820156410577229101238628035242
   DC:eu-west
   RACK:1c
   LOAD:8.2077425E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.239.95.94
   STATUS:NORMAL,113427455640312821154458202477256070485
   RACK:1b
   LOAD:6.5744801E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.49.37.125
   STATUS:NORMAL,85070591730234615865843651857942052864
   RACK:1a
   LOAD:6.6832903E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.239.95.182
   STATUS:NORMAL,28356863910078205288614550619314017621
   RACK:1b
   LOAD:6.775791E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
dsc1a.internal/10.226.74.97
   STATUS:NORMAL,0
   DC:eu-west
   RACK:1a
   LOAD:7.8533797E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.248.81.46
   STATUS:NORMAL,56713727820156410577229101238628035242
   RACK:1c
   LOAD:6.8754218E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.228.37.155
   STATUS:NORMAL,113427455640312821154458202477256070485
   DC:eu-west
   RACK:1b
   LOAD:7.8066429E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.248.83.29
   STATUS:NORMAL,141784319550391026443072753096570088106
   RACK:1c
   LOAD:6.5235089E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.250.217.83
   STATUS:NORMAL,141784319550391026443072753096570088106
   LOAD:7.598275E7
   DC:eu-west
   RACK:1c
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9

Thanks
Alex

On 05/01/2012 04:16 AM, aaron morton wrote:
> Gossip information about a node can stay in the cluster for up to 3 
> days. How long has this been going on for ?
>
> I'm unsure if this is expected behaviour. But it sounds like Gossip is 
> kicking out the phantom node correctly.
>
> Can you use nodetool gossipinfo on the nodes to capture some artefacts 
> while it is still running?
>
>> How come the old ip 10.63.14.214 still popup as UP and then declared 
>> as DEAD again, an so on and on?
> I think this is gossip bouncing information about the node around. 
> Once it has been observed as dead for 3 days it should be purged.
>> Another question, if node is recognised as new (due to ip change) but 
>> with same token - will other nodes stream the hinted handoffs to it?
> Hints are stored against the token, not the end point address. When a 
> node comes up the process is reversed and the end point is mapped to 
> it's (new) token.
>
>>  And is there way to tell cassandra also use names and if ip changes 
>> but node name is the same and resolves to the new ip then the cluster 
>> treat it as old node?
>>
> Not that I am aware of. It's designed to handle IP addresses changing. 
> AFAIK the log messages are not indicative of a fault. Instead they 
> indicate something odd happening with Gossip that is being correctly 
> handled.
>
> Hope that helps.
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/05/2012, at 3:09 AM, Piavlo wrote:
>
>>
>> Hi,
>>
>> We have a cassandra cluster in ec2.
>> If i stop a node and start it - as a result the node ip changes. The 
>> node is recognised as NEW node and is declared as replacing the 
>> previous node with same token.(But this is the same node of course)
>>
>> In this specific case the node ip before stop/start was 10.63.14.214 
>> and new ip is 10.54.81.14.
>> And even that the cluster and node  seems to be working fine for more 
>> than a day after the stop/start of this node, I see the following 
>> loop of messages ~ once every minute.
>>
>> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) 
>> Node /10.63.14.214 is now part of the cluster
>> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 804) 
>> InetAddress /10.63.14.214 is now UP
>> INFO [GossipStage:1] 2012-04-30 14:18:57,090 StorageService.java 
>> (line 1017) Nodes /10.63.14.214 and cassa1a.internal/10.54.81.14 have 
>> the same token 0.  Ignoring /10.63.14.214
>> INFO [GossipTasks:1] 2012-04-30 14:19:11,834 Gossiper.java (line 818) 
>> InetAddress /10.63.14.214 is now dead.
>> INFO [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) 
>> FatClient /10.63.14.214 has been silent for 30000ms, removing from gossip
>> INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java (line 838) 
>> Node /10.63.14.214 is now part of the cluster
>> ...
>>
>> How come the old ip 10.63.14.214 still popup as UP and then declared 
>> as DEAD again, an so on and on?
>> I know since this is ec2 other node with same ip can come UP, but 
>> i've verified and there is no such node and it certainly does not run 
>> cassandra :)
>> I stop/started another node and observe similar behaviour.
>> This is version 1.0.8
>>
>> Another question, if node is recognised as new (due to ip change) but 
>> with same token - will other nodes stream the hinted handoffs to it?
>> And is there way to tell cassandra also use names and if ip changes 
>> but node name is the same and resolves to the new ip then the cluster 
>> treat it as old node?
>>
>> Thanks
>> Alex
>

Re: strange gossip messages after node reboot with different ip

Posted by Piavlo <lo...@gmail.com>.

On 05/01/2012 04:16 AM, aaron morton wrote:
> Gossip information about a node can stay in the cluster for up to 3 
> days. How long has this been going on for ?
This has been going for over a week already without any signs of slow 
down, all nodes that have changed ip popup as UP/DEAD endlessly.
Any ideas?

Thanks
>
> I'm unsure if this is expected behaviour. But it sounds like Gossip is 
> kicking out the phantom node correctly.
>
> Can you use nodetool gossipinfo on the nodes to capture some artefacts 
> while it is still running?
>
>> How come the old ip 10.63.14.214 still popup as UP and then declared 
>> as DEAD again, an so on and on?
> I think this is gossip bouncing information about the node around. 
> Once it has been observed as dead for 3 days it should be purged.
>> Another question, if node is recognised as new (due to ip change) but 
>> with same token - will other nodes stream the hinted handoffs to it?
> Hints are stored against the token, not the end point address. When a 
> node comes up the process is reversed and the end point is mapped to 
> it's (new) token.
>
>>  And is there way to tell cassandra also use names and if ip changes 
>> but node name is the same and resolves to the new ip then the cluster 
>> treat it as old node?
>>
> Not that I am aware of. It's designed to handle IP addresses changing. 
> AFAIK the log messages are not indicative of a fault. Instead they 
> indicate something odd happening with Gossip that is being correctly 
> handled.
>
> Hope that helps.
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/05/2012, at 3:09 AM, Piavlo wrote:
>
>>
>> Hi,
>>
>> We have a cassandra cluster in ec2.
>> If i stop a node and start it - as a result the node ip changes. The 
>> node is recognised as NEW node and is declared as replacing the 
>> previous node with same token.(But this is the same node of course)
>>
>> In this specific case the node ip before stop/start was 10.63.14.214 
>> and new ip is 10.54.81.14.
>> And even that the cluster and node  seems to be working fine for more 
>> than a day after the stop/start of this node, I see the following 
>> loop of messages ~ once every minute.
>>
>> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) 
>> Node /10.63.14.214 is now part of the cluster
>> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 804) 
>> InetAddress /10.63.14.214 is now UP
>> INFO [GossipStage:1] 2012-04-30 14:18:57,090 StorageService.java 
>> (line 1017) Nodes /10.63.14.214 and cassa1a.internal/10.54.81.14 have 
>> the same token 0.  Ignoring /10.63.14.214
>> INFO [GossipTasks:1] 2012-04-30 14:19:11,834 Gossiper.java (line 818) 
>> InetAddress /10.63.14.214 is now dead.
>> INFO [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) 
>> FatClient /10.63.14.214 has been silent for 30000ms, removing from gossip
>> INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java (line 838) 
>> Node /10.63.14.214 is now part of the cluster
>> ...
>>
>> How come the old ip 10.63.14.214 still popup as UP and then declared 
>> as DEAD again, an so on and on?
>> I know since this is ec2 other node with same ip can come UP, but 
>> i've verified and there is no such node and it certainly does not run 
>> cassandra :)
>> I stop/started another node and observe similar behaviour.
>> This is version 1.0.8
>>
>> Another question, if node is recognised as new (due to ip change) but 
>> with same token - will other nodes stream the hinted handoffs to it?
>> And is there way to tell cassandra also use names and if ip changes 
>> but node name is the same and resolves to the new ip then the cluster 
>> treat it as old node?
>>
>> Thanks
>> Alex
>

Re: strange gossip messages after node reboot with different ip

Posted by aaron morton <aa...@thelastpickle.com>.

Gossip information about a node can stay in the cluster for up to 3 days. How long has this been going on for ? 

I'm unsure if this is expected behaviour. But it sounds like Gossip is kicking out the phantom node correctly.

Can you use nodetool gossipinfo on the nodes to capture some artefacts while it is still running?

> How come the old ip 10.63.14.214 still popup as UP and then declared as DEAD again, an so on and on?
I think this is gossip bouncing information about the node around. Once it has been observed as dead for 3 days it should be purged.
  
> Another question, if node is recognised as new (due to ip change) but with same token - will other nodes stream the hinted handoffs to it?
Hints are stored against the token, not the end point address. When a node comes up the process is reversed and the end point is mapped to it's (new) token.

>  And is there way to tell cassandra also use names and if ip changes but node name is the same and resolves to the new ip then the cluster treat it as old node?
> 
Not that I am aware of. It's designed to handle IP addresses changing. AFAIK the log messages are not indicative of a fault. Instead they indicate something odd happening with Gossip that is being correctly handled. 

Hope that helps. 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/05/2012, at 3:09 AM, Piavlo wrote:

> 
> Hi,
> 
> We have a cassandra cluster in ec2.
> If i stop a node and start it - as a result the node ip changes. The node is recognised as NEW node and is declared as replacing the previous node with same token.(But this is the same node of course)
> 
> In this specific case the node ip before stop/start was 10.63.14.214 and new ip is 10.54.81.14.
> And even that the cluster and node  seems to be working fine for more than a day after the stop/start of this node, I see the following loop of messages ~ once every minute.
> 
> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) Node /10.63.14.214 is now part of the cluster
> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 804) InetAddress /10.63.14.214 is now UP
> INFO [GossipStage:1] 2012-04-30 14:18:57,090 StorageService.java (line 1017) Nodes /10.63.14.214 and cassa1a.internal/10.54.81.14 have the same token 0.  Ignoring /10.63.14.214
> INFO [GossipTasks:1] 2012-04-30 14:19:11,834 Gossiper.java (line 818) InetAddress /10.63.14.214 is now dead.
> INFO [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) FatClient /10.63.14.214 has been silent for 30000ms, removing from gossip
> INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java (line 838) Node /10.63.14.214 is now part of the cluster
> ...
> 
> How come the old ip 10.63.14.214 still popup as UP and then declared as DEAD again, an so on and on?
> I know since this is ec2 other node with same ip can come UP, but i've verified and there is no such node and it certainly does not run cassandra :)
> I stop/started another node and observe similar behaviour.
> This is version 1.0.8
> 
> Another question, if node is recognised as new (due to ip change) but with same token - will other nodes stream the hinted handoffs to it?
> And is there way to tell cassandra also use names and if ip changes but node name is the same and resolves to the new ip then the cluster treat it as old node?
> 
> Thanks
> Alex