You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by sai krishnam raju potturi <ps...@gmail.com> on 2016/02/16 20:08:24 UTC

Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

hi;
    we have a 12 node cluster across 2 datacenters. We are currently using
cassandra 2.1.12 version.

SNITCH : GossipingPropertyFileSnitch

When we decommissioned few nodes in a particular datacenter and observed
the following :

nodetool status shows only the live nodes in the cluster.

nodetool describecluster shows the decommissioned nodes as UNREACHABLE.

nodetool gossipinfo shows the decommissioned nodes as "LEFT"


When the live nodes were restarted, "nodetool describecluster" shows only
the live nodes, which is expected.

Purging the gossip info too did not help.

INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
140213388002871593911508364312533329916,
 98576967436431350637134234839492449485] for /X.X.X.X
INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
INFO  17:27:07 Removing tokens [11116977666116265389022494863106850615,
111270759969411259938117902792984586225,
138611464975439236357814418845450428175] for /X.X.X.X

Has anybody experienced similar behaviour. Restarting the entire cluster,
 everytime a node is decommissioned does not seem right. Thanks in advance
for the help.


thanks
Sai

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

Posted by sai krishnam raju potturi <ps...@gmail.com>.

thanks Rajesh. What we have observed is the decommissioned nodes show up as
"UNREACHABLE" in "nodetool describecluster" command. Their status shows up
as "LEFT" in "nodetool gossipinfo". This is observed in 2.1.12 version.

Decommissioned nodes did not show up in the "nodetool describecluster" and
"nodetool gossipinfo" in 2.0.14 version that we use in another cluster.


thanks
Sai

On Tue, Feb 16, 2016 at 2:08 PM, sai krishnam raju potturi <
pskraju88@gmail.com> wrote:

> hi;
>     we have a 12 node cluster across 2 datacenters. We are currently using
> cassandra 2.1.12 version.
>
> SNITCH : GossipingPropertyFileSnitch
>
> When we decommissioned few nodes in a particular datacenter and observed
> the following :
>
> nodetool status shows only the live nodes in the cluster.
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> When the live nodes were restarted, "nodetool describecluster" shows only
> the live nodes, which is expected.
>
> Purging the gossip info too did not help.
>
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
> 140213388002871593911508364312533329916,
>  98576967436431350637134234839492449485] for /X.X.X.X
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [11116977666116265389022494863106850615,
> 111270759969411259938117902792984586225,
> 138611464975439236357814418845450428175] for /X.X.X.X
>
> Has anybody experienced similar behaviour. Restarting the entire cluster,
>  everytime a node is decommissioned does not seem right. Thanks in advance
> for the help.
>
>
> thanks
> Sai
>
>
>

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

Posted by sai krishnam raju potturi <ps...@gmail.com>.

thank you Ben. We are using cassandra 2.1.12 version. We did face the bug
mentioned  https://issues.apache.org/jira/browse/CASSANDRA-10371 in DSE
4.6.7, in another cluster. It's strange we are seeing that even
in cassandra 2.1.12 version.

  The "nodetool describecluster" showing decommissioned nodes as
UNREACHABLE is something we are seeing for the first time.

thanks
Sai

On Wed, Feb 17, 2016 at 12:36 PM, Ben Bromhead <be...@instaclustr.com> wrote:

> I'm not sure what version of Cassandra you are running so here is some
> general advice:
>
>    - Gossip entries for decommissioned nodes will hang around for a few
>    days to help catch up nodes in the case of a partition. This is why you see
>    the decommissioned nodes listed as LEFT. This is intentional
>    - If you keep seeing those entries in your logs and you are on 2.0.x,
>    you might be impacted by
>    https://issues.apache.org/jira/browse/CASSANDRA-10371. In this case
>    upgrade to 2.1 or you can try the work arounds listed in the ticket.
>
> Ben
>
> On Tue, 16 Feb 2016 at 11:09 sai krishnam raju potturi <
> pskraju88@gmail.com> wrote:
>
>> hi;
>>     we have a 12 node cluster across 2 datacenters. We are currently
>> using cassandra 2.1.12 version.
>>
>> SNITCH : GossipingPropertyFileSnitch
>>
>> When we decommissioned few nodes in a particular datacenter and observed
>> the following :
>>
>> nodetool status shows only the live nodes in the cluster.
>>
>> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>>
>>
>> When the live nodes were restarted, "nodetool describecluster" shows
>> only the live nodes, which is expected.
>>
>> Purging the gossip info too did not help.
>>
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
>> 140213388002871593911508364312533329916,
>>  98576967436431350637134234839492449485] for /X.X.X.X
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [11116977666116265389022494863106850615,
>> 111270759969411259938117902792984586225,
>> 138611464975439236357814418845450428175] for /X.X.X.X
>>
>> Has anybody experienced similar behaviour. Restarting the entire cluster,
>>  everytime a node is decommissioned does not seem right. Thanks in advance
>> for the help.
>>
>>
>> thanks
>> Sai
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

Posted by Ben Bromhead <be...@instaclustr.com>.

I'm not sure what version of Cassandra you are running so here is some
general advice:

   - Gossip entries for decommissioned nodes will hang around for a few
   days to help catch up nodes in the case of a partition. This is why you see
   the decommissioned nodes listed as LEFT. This is intentional
   - If you keep seeing those entries in your logs and you are on 2.0.x,
   you might be impacted by
   https://issues.apache.org/jira/browse/CASSANDRA-10371. In this case
   upgrade to 2.1 or you can try the work arounds listed in the ticket.

Ben

On Tue, 16 Feb 2016 at 11:09 sai krishnam raju potturi <ps...@gmail.com>
wrote:

> hi;
>     we have a 12 node cluster across 2 datacenters. We are currently using
> cassandra 2.1.12 version.
>
> SNITCH : GossipingPropertyFileSnitch
>
> When we decommissioned few nodes in a particular datacenter and observed
> the following :
>
> nodetool status shows only the live nodes in the cluster.
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> When the live nodes were restarted, "nodetool describecluster" shows only
> the live nodes, which is expected.
>
> Purging the gossip info too did not help.
>
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
> 140213388002871593911508364312533329916,
>  98576967436431350637134234839492449485] for /X.X.X.X
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [11116977666116265389022494863106850615,
> 111270759969411259938117902792984586225,
> 138611464975439236357814418845450428175] for /X.X.X.X
>
> Has anybody experienced similar behaviour. Restarting the entire cluster,
>  everytime a node is decommissioned does not seem right. Thanks in advance
> for the help.
>
>
> thanks
> Sai
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

Posted by sai krishnam raju potturi <ps...@gmail.com>.

thanks a lot Alian. We did rely on "unsafeassasinate" earlier, which
worked. We were planning to upgrade from 2.0.14 version to 2.1.12, on all
our clusters.
  But we are trying to figure out why decommissioned nodes are showing up
in the "nodetool describecluster" as "UNREACHABLE".

thanks
Sai

On Wed, Feb 17, 2016 at 5:42 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hi,
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> I believe this is the expected behavior, we keep some a trace of leaving
> nodes for a few days, this shouldn't be an issue for you
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>
> This is a weird behaviour I haven't see for a while. You might want to dig
> this some more.
>
> Restarting the entire cluster,  everytime a node is decommissioned does
>> not seem right
>>
>
> Meanwhile, if you are sure the node is out and streams have ended, I guess
> it could be ok to use a JMX client (MX4J, JConsole...) and then use the JMX
> method Gossiper.unsafeAssassinateEndpoints(ip_address) to assassinate the
> gone node from any of the remaining nodes.
>
> How to -->
> http://tumblr.doki-pen.org/post/22654515359/assassinating-cassandra-nodes
> (3 years old post, I partially read it, but I think it might still be
> relevant)
>
> Has anybody experienced similar behaviour
>
>
> FTR, 3 years old similar issue I faced -->
> http://grokbase.com/t/cassandra/user/127knx7nn0/unreachable-node-not-in-nodetool-ring
>
> FWIW, people using C* = 3.x, this is exposed through nodetool -->
> https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsAssassinate.html
>
> Keep in mind that something called 'unsafe' and 'assassinate' at the same
> time is not something you want to use in a regular decommissioning process
> as it drop the node with no file transfer, you basically totally lose a
> node (unless node is out already which seems to be your case, it should be
> safe to use it in your case). I only used it to fix gossip status in the
> past or at some point when forcing a removenode was not working, followed
> by full repairs on remaining nodes.
>
> C*heers,
> -----------------
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-16 20:08 GMT+01:00 sai krishnam raju potturi <ps...@gmail.com>
> :
>
>> hi;
>>     we have a 12 node cluster across 2 datacenters. We are currently
>> using cassandra 2.1.12 version.
>>
>> SNITCH : GossipingPropertyFileSnitch
>>
>> When we decommissioned few nodes in a particular datacenter and observed
>> the following :
>>
>> nodetool status shows only the live nodes in the cluster.
>>
>> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>>
>>
>> When the live nodes were restarted, "nodetool describecluster" shows
>> only the live nodes, which is expected.
>>
>> Purging the gossip info too did not help.
>>
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
>> 140213388002871593911508364312533329916,
>>  98576967436431350637134234839492449485] for /X.X.X.X
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [11116977666116265389022494863106850615,
>> 111270759969411259938117902792984586225,
>> 138611464975439236357814418845450428175] for /X.X.X.X
>>
>> Has anybody experienced similar behaviour. Restarting the entire cluster,
>>  everytime a node is decommissioned does not seem right. Thanks in advance
>> for the help.
>>
>>
>> thanks
>> Sai
>>
>>
>>
>

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi,

nodetool gossipinfo shows the decommissioned nodes as "LEFT"


I believe this is the expected behavior, we keep some a trace of leaving
nodes for a few days, this shouldn't be an issue for you

nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>

This is a weird behaviour I haven't see for a while. You might want to dig
this some more.

Restarting the entire cluster,  everytime a node is decommissioned does not
> seem right
>

Meanwhile, if you are sure the node is out and streams have ended, I guess
it could be ok to use a JMX client (MX4J, JConsole...) and then use the JMX
method Gossiper.unsafeAssassinateEndpoints(ip_address) to assassinate the
gone node from any of the remaining nodes.

How to -->
http://tumblr.doki-pen.org/post/22654515359/assassinating-cassandra-nodes
(3 years old post, I partially read it, but I think it might still be
relevant)

Has anybody experienced similar behaviour


FTR, 3 years old similar issue I faced -->
http://grokbase.com/t/cassandra/user/127knx7nn0/unreachable-node-not-in-nodetool-ring

FWIW, people using C* = 3.x, this is exposed through nodetool -->
https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsAssassinate.html

Keep in mind that something called 'unsafe' and 'assassinate' at the same
time is not something you want to use in a regular decommissioning process
as it drop the node with no file transfer, you basically totally lose a
node (unless node is out already which seems to be your case, it should be
safe to use it in your case). I only used it to fix gossip status in the
past or at some point when forcing a removenode was not working, followed
by full repairs on remaining nodes.

C*heers,
-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-16 20:08 GMT+01:00 sai krishnam raju potturi <ps...@gmail.com>:

> hi;
>     we have a 12 node cluster across 2 datacenters. We are currently using
> cassandra 2.1.12 version.
>
> SNITCH : GossipingPropertyFileSnitch
>
> When we decommissioned few nodes in a particular datacenter and observed
> the following :
>
> nodetool status shows only the live nodes in the cluster.
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> When the live nodes were restarted, "nodetool describecluster" shows only
> the live nodes, which is expected.
>
> Purging the gossip info too did not help.
>
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
> 140213388002871593911508364312533329916,
>  98576967436431350637134234839492449485] for /X.X.X.X
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [11116977666116265389022494863106850615,
> 111270759969411259938117902792984586225,
> 138611464975439236357814418845450428175] for /X.X.X.X
>
> Has anybody experienced similar behaviour. Restarting the entire cluster,
>  everytime a node is decommissioned does not seem right. Thanks in advance
> for the help.
>
>
> thanks
> Sai
>
>
>