You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by S C <as...@outlook.com> on 2013/04/05 01:11:36 UTC

gossip not working

I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other nodes in the ring.
On 192.168.56.10 and 11
192.168.56.10  DC1-Cass    RAC1        Up     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Up     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Down   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

On 192.168.56.12
192.168.56.10  DC1-Cass    RAC1        Down     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Down     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Up   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

I do not see anything in the logs that tells me that there is a gossip issue.
nodetool infoToken            : 85070591730234615865843651857942052864Gossip active    : trueThrift active    : trueLoad             : 29.05 GBGeneration No    : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 7945.94Exceptions       : 0Key Cache        : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
nodetool infoToken            : 42535295865117307932921825928971026432Gossip active    : trueThrift active    : trueLoad             : 31.59 GBGeneration No    : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / 7945.94Exceptions       : 1Key Cache        : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds


There is no firewall between the nodes and I can reach each other on storage port. What else should I be looking at to find root cause? Appreciate your inputs.

RE: gossip not working

Posted by S C <as...@outlook.com>.

I did try this option and everything is working fine. Thank you Aaron.
From: aaron@thelastpickle.com
Subject: Re: gossip not working
Date: Fri, 5 Apr 2013 23:02:58 +0530
To: user@cassandra.apache.org

Starting the node with the JVM option -Dcassandra.load_ring_state=false in cassandra-env.sh sometimes works. 
If not post the output from nodetool gossipinfo
Cheers

-----------------Aaron MortonFreelance Cassandra ConsultantNew Zealand
@aaronmortonhttp://www.thelastpickle.com

On 5/04/2013, at 9:38 AM, S C <as...@outlook.com> wrote:Is there a way to force gossip among the nodes?

From: asf11@outlook.com
To: user@cassandra.apache.org
Subject: RE: gossip not working
Date: Thu, 4 Apr 2013 19:59:45 -0500

I am not seeing anything in the logs other than "Starting up server gossip" and there is no firewall between the nodes.
From: paulsudol@gmail.com
Subject: Re: gossip not working
Date: Thu, 4 Apr 2013 18:49:29 -0500
To: user@cassandra.apache.org

What errors are you seeing in the log files of the down nodes? Did you run upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9
On Apr 4, 2013, at 6:11 PM, S C <as...@outlook.com> wrote:I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other nodes in the ring.
On 192.168.56.10 and 11
192.168.56.10  DC1-Cass    RAC1        Up     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Up     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Down   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

On 192.168.56.12
192.168.56.10  DC1-Cass    RAC1        Down     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Down     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Up   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

I do not see anything in the logs that tells me that there is a gossip issue.
nodetool infoToken            : 85070591730234615865843651857942052864Gossip active    : trueThrift active    : trueLoad             : 29.05 GBGeneration No    : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 7945.94Exceptions       : 0Key Cache        : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
nodetool infoToken            : 42535295865117307932921825928971026432Gossip active    : trueThrift active    : trueLoad             : 31.59 GBGeneration No    : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / 7945.94Exceptions       : 1Key Cache        : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

There is no firewall between the nodes and I can reach each other on storage port. What else should I be looking at to find root cause? Appreciate your inputs.

Re: gossip not working

Posted by aaron morton <aa...@thelastpickle.com>.

Starting the node with the JVM option -Dcassandra.load_ring_state=false in cassandra-env.sh sometimes works. 

If not post the output from nodetool gossipinfo

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/04/2013, at 9:38 AM, S C <as...@outlook.com> wrote:

> Is there a way to force gossip among the nodes?
> 
> From: asf11@outlook.com
> To: user@cassandra.apache.org
> Subject: RE: gossip not working
> Date: Thu, 4 Apr 2013 19:59:45 -0500
> 
> I am not seeing anything in the logs other than "Starting up server gossip" 
> and there is no firewall between the nodes.
> From: paulsudol@gmail.com
> Subject: Re: gossip not working
> Date: Thu, 4 Apr 2013 18:49:29 -0500
> To: user@cassandra.apache.org
> 
> What errors are you seeing in the log files of the down nodes? Did you run upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9
> 
> On Apr 4, 2013, at 6:11 PM, S C <as...@outlook.com> wrote:
> 
> I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other nodes in the ring.
> 
> On 192.168.56.10 and 11
> 
> 192.168.56.10  DC1-Cass    RAC1        Up     Normal  28.06 GB        50.00%              0                                           
> 192.168.56.11  DC1-Cass    RAC1        Up     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      
> 192.168.56.12  DC1-Cass    RAC1        Down   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    
> 
> 
> On 192.168.56.12
> 
> 192.168.56.10  DC1-Cass    RAC1        Down     Normal  28.06 GB        50.00%              0                                           
> 192.168.56.11  DC1-Cass    RAC1        Down     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      
> 192.168.56.12  DC1-Cass    RAC1        Up   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    
> 
> 
> I do not see anything in the logs that tells me that there is a gossip issue.
> 
> nodetool info
> Token            : 85070591730234615865843651857942052864
> Gossip active    : true
> Thrift active    : true
> Load             : 29.05 GB
> Generation No    : 1365114563
> Uptime (seconds) : 2127
> Heap Memory (MB) : 848.71 / 7945.94
> Exceptions       : 0
> Key Cache        : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save period in seconds
> Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
> 
> nodetool info
> Token            : 42535295865117307932921825928971026432
> Gossip active    : true
> Thrift active    : true
> Load             : 31.59 GB
> Generation No    : 1364413038
> Uptime (seconds) : 703904
> Heap Memory (MB) : 733.02 / 7945.94
> Exceptions       : 1
> Key Cache        : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in seconds
> Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
> 
> 
> 
> There is no firewall between the nodes and I can reach each other on storage port. 
> What else should I be looking at to find root cause? Appreciate your inputs.

RE: gossip not working

Posted by S C <as...@outlook.com>.

Is there a way to force gossip among the nodes?

From: asf11@outlook.com
To: user@cassandra.apache.org
Subject: RE: gossip not working
Date: Thu, 4 Apr 2013 19:59:45 -0500

I am not seeing anything in the logs other than "Starting up server gossip" and there is no firewall between the nodes.
From: paulsudol@gmail.com
Subject: Re: gossip not working
Date: Thu, 4 Apr 2013 18:49:29 -0500
To: user@cassandra.apache.org

What errors are you seeing in the log files of the down nodes? Did you run upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9
On Apr 4, 2013, at 6:11 PM, S C <as...@outlook.com> wrote:I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other nodes in the ring.
On 192.168.56.10 and 11
192.168.56.10  DC1-Cass    RAC1        Up     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Up     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Down   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

On 192.168.56.12
192.168.56.10  DC1-Cass    RAC1        Down     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Down     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Up   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

I do not see anything in the logs that tells me that there is a gossip issue.
nodetool infoToken            : 85070591730234615865843651857942052864Gossip active    : trueThrift active    : trueLoad             : 29.05 GBGeneration No    : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 7945.94Exceptions       : 0Key Cache        : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
nodetool infoToken            : 42535295865117307932921825928971026432Gossip active    : trueThrift active    : trueLoad             : 31.59 GBGeneration No    : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / 7945.94Exceptions       : 1Key Cache        : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

There is no firewall between the nodes and I can reach each other on storage port. What else should I be looking at to find root cause? Appreciate your inputs.

RE: gossip not working

Posted by S C <as...@outlook.com>.

I am not seeing anything in the logs other than "Starting up server gossip" and there is no firewall between the nodes.
From: paulsudol@gmail.com
Subject: Re: gossip not working
Date: Thu, 4 Apr 2013 18:49:29 -0500
To: user@cassandra.apache.org

What errors are you seeing in the log files of the down nodes? Did you run upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9
On Apr 4, 2013, at 6:11 PM, S C <as...@outlook.com> wrote:I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other nodes in the ring.
On 192.168.56.10 and 11
192.168.56.10  DC1-Cass    RAC1        Up     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Up     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Down   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

On 192.168.56.12
192.168.56.10  DC1-Cass    RAC1        Down     Normal  28.06 GB        50.00%              0                                           192.168.56.11  DC1-Cass    RAC1        Down     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      192.168.56.12  DC1-Cass    RAC1        Up   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    

I do not see anything in the logs that tells me that there is a gossip issue.
nodetool infoToken            : 85070591730234615865843651857942052864Gossip active    : trueThrift active    : trueLoad             : 29.05 GBGeneration No    : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 7945.94Exceptions       : 0Key Cache        : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
nodetool infoToken            : 42535295865117307932921825928971026432Gossip active    : trueThrift active    : trueLoad             : 31.59 GBGeneration No    : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / 7945.94Exceptions       : 1Key Cache        : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in secondsRow Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

There is no firewall between the nodes and I can reach each other on storage port. What else should I be looking at to find root cause? Appreciate your inputs.

Re: gossip not working

Posted by Paul Sudol <pa...@gmail.com>.

What errors are you seeing in the log files of the down nodes? Did you run upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9

On Apr 4, 2013, at 6:11 PM, S C <as...@outlook.com> wrote:

> I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other nodes in the ring.
> 
> On 192.168.56.10 and 11
> 
> 192.168.56.10  DC1-Cass    RAC1        Up     Normal  28.06 GB        50.00%              0                                           
> 192.168.56.11  DC1-Cass    RAC1        Up     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      
> 192.168.56.12  DC1-Cass    RAC1        Down   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    
> 
> 
> On 192.168.56.12
> 
> 192.168.56.10  DC1-Cass    RAC1        Down     Normal  28.06 GB        50.00%              0                                           
> 192.168.56.11  DC1-Cass    RAC1        Down     Normal  31.59 GB        25.00%              42535295865117307932921825928971026432      
> 192.168.56.12  DC1-Cass    RAC1        Up   Normal  29.02 GB        25.00%              85070591730234615865843651857942052864    
> 
> 
> I do not see anything in the logs that tells me that there is a gossip issue.
> 
> nodetool info
> Token            : 85070591730234615865843651857942052864
> Gossip active    : true
> Thrift active    : true
> Load             : 29.05 GB
> Generation No    : 1365114563
> Uptime (seconds) : 2127
> Heap Memory (MB) : 848.71 / 7945.94
> Exceptions       : 0
> Key Cache        : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save period in seconds
> Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
> 
> nodetool info
> Token            : 42535295865117307932921825928971026432
> Gossip active    : true
> Thrift active    : true
> Load             : 31.59 GB
> Generation No    : 1364413038
> Uptime (seconds) : 703904
> Heap Memory (MB) : 733.02 / 7945.94
> Exceptions       : 1
> Key Cache        : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in seconds
> Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
> 
> 
> 
> There is no firewall between the nodes and I can reach each other on storage port. 
> What else should I be looking at to find root cause? Appreciate your inputs.