You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jai Bheemsen Rao Dhanwada <ja...@gmail.com> on 2019/06/12 22:54:44 UTC

Decommissioned nodes are in UNREACHABLE state

Hello,

I have a Cassandra cluster running with 2.1.16 version of Cassandra, where
I have decommissioned few nodes from the cluster using "nodetool
decommission", but I see the node IPs in UNREACHABLE state in "nodetool
describecluster" output. I believe  they appear only for 72 hours, but in
my case I see those nodes in UNREACHABLE for ever (more than 60 days).
Rolling restart of the nodes didn't remove them. any idea what could be
causing here?

Note: I don't see them in the nodetool status output.

Re: Decommissioned nodes are in UNREACHABLE state

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello,

Wow. This is a 1 year old issue. Are we still talking about the same node?
Other than what I wrote in my previous message, I'm not sure how to guide
you on that one.

 I am wondering, where the information is coming from.


Me too! :).


I checked system.peers for the IP in UNREACHABLE state and it's not present
>

Have you looked at all the nodes? This system table is *NOT* distributed,
so querying this 'cqlsh -e "SELECT * FROM system.peers;"' will give
different results on each of the node. It's enough of having one node with
this node for it to show up on 'nodetool describecluster' output as
UNREACHABLE I think.

Random ideas and questions:
- Does the corresponding instance still exist?
- Are we speaking of the same node than last year? That would be the
longest lasting ghost node I've heard about :).
- Is it defined somewhere in your configuration still (Snitch config file:
cassandra-topology.properties - maybe if using PropertyFileSnitch)?
- What's the C* version you're using? Still 2.1.16?

I somewhat feel you (or I...) might be missing something here, the node has
to be referenced somewhere, if not it would disappear on restart.

Good luck with that.

C*heers,
-----------------------
Alain Rodriguez - alain.rodriguez@datastax.com
France / Spain

https://www.datastax.com



Le sam. 23 mai 2020 à 19:07, Jai Bheemsen Rao Dhanwada <
jaibheemsen@gmail.com> a écrit :

> any inputs here?
>
> On Sat, May 2, 2020 at 12:49 PM Jai Bheemsen Rao Dhanwada <
> jaibheemsen@gmail.com> wrote:
>
>> Hello Alain,
>>
>> Thanks for your suggestions.
>>
>> Surprisingly, the node which is in unreachable state, is not present in
>> any of the system tables. I am wondering, where the information is coming
>> from.
>> I checked system.peers for the IP in UNREACHABLE state and it's not
>> present. I tried restart of Cassandra service as well.
>>
>> On Thu, Jun 20, 2019 at 5:59 AM Alain RODRIGUEZ <ar...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Assuming you nodes are out for a while and you don't need the data after
>>> 60 days (or cannot get it anyway), the way to fix this is to force the node
>>> out. I would try, in this order:
>>>
>>> - nodetool removenode HOSTID
>>> - nodetool removenode force
>>>
>>> These 2 might really not work at this stage, but if they do, this is a
>>> clean way to do so.
>>> Now, to really push the ghost nodes to the exit door, it often takes:
>>>
>>> - nodetool assassinate
>>>
>>> I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
>>> details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html
>>> ):
>>>
>>> echo "run -b org.apache.cassandra.net:type=Gossiper
>>>> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
>>>> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199
>>>
>>>
>>> This should really remove the traces of the node, without any safety, no
>>> streaming, no checks, just get rid of it. So to use with a lot of care and
>>> understanding. In your situation I guess this is what will work.
>>>
>>> As a last attempt, you could try removing traces of the dead node(s)
>>> from all the live nodes 'system.peers' table. This table is local to each
>>> node, so the DELETE command is to be send to all the nodes (that have a
>>> trace of an old node).
>>>
>>> - cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"
>>>
>>> but I see the node IPs in UNREACHABLE state in "nodetool
>>>> describecluster" output. I believe  they appear only for 72 hours, but in
>>>> my case I see those nodes in UNREACHABLE for ever (more than 60 days)
>>>
>>>
>>> To be more accurate,  you should never see leaving node as unreachable I
>>> believe (not even for 72 hours). The 72 hours is the time Gossip should
>>> continue referencing the old nodes. Typically when you remove the ghost
>>> nodes, they should no longer appear in 'nodetool describe' cluster at all,
>>>  I would say immediately, but still appear in 'nodetool gossipinfo' with a
>>> 'left' or 'remove' status.
>>>
>>> I hope that helps and that one of the above will do the trick (I'd bet
>>> on the assassinate :)). Also sorry it took us a while to answer you this
>>> relatively common question :);
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - alain@thelastpickle.com
>>> France / Spain
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
>>> jaibheemsen@gmail.com> a écrit :
>>>
>>>> Hello,
>>>>
>>>> I have a Cassandra cluster running with 2.1.16 version of Cassandra,
>>>> where I have decommissioned few nodes from the cluster using "nodetool
>>>> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
>>>> describecluster" output. I believe  they appear only for 72 hours, but in
>>>> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
>>>> Rolling restart of the nodes didn't remove them. any idea what could be
>>>> causing here?
>>>>
>>>> Note: I don't see them in the nodetool status output.
>>>>
>>>

Re: Decommissioned nodes are in UNREACHABLE state

Posted by Jai Bheemsen Rao Dhanwada <ja...@gmail.com>.
any inputs here?

On Sat, May 2, 2020 at 12:49 PM Jai Bheemsen Rao Dhanwada <
jaibheemsen@gmail.com> wrote:

> Hello Alain,
>
> Thanks for your suggestions.
>
> Surprisingly, the node which is in unreachable state, is not present in
> any of the system tables. I am wondering, where the information is coming
> from.
> I checked system.peers for the IP in UNREACHABLE state and it's not
> present. I tried restart of Cassandra service as well.
>
> On Thu, Jun 20, 2019 at 5:59 AM Alain RODRIGUEZ <ar...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Assuming you nodes are out for a while and you don't need the data after
>> 60 days (or cannot get it anyway), the way to fix this is to force the node
>> out. I would try, in this order:
>>
>> - nodetool removenode HOSTID
>> - nodetool removenode force
>>
>> These 2 might really not work at this stage, but if they do, this is a
>> clean way to do so.
>> Now, to really push the ghost nodes to the exit door, it often takes:
>>
>> - nodetool assassinate
>>
>> I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
>> details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html
>> ):
>>
>> echo "run -b org.apache.cassandra.net:type=Gossiper
>>> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
>>> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199
>>
>>
>> This should really remove the traces of the node, without any safety, no
>> streaming, no checks, just get rid of it. So to use with a lot of care and
>> understanding. In your situation I guess this is what will work.
>>
>> As a last attempt, you could try removing traces of the dead node(s) from
>> all the live nodes 'system.peers' table. This table is local to each node,
>> so the DELETE command is to be send to all the nodes (that have a trace of
>> an old node).
>>
>> - cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"
>>
>> but I see the node IPs in UNREACHABLE state in "nodetool describecluster"
>>> output. I believe  they appear only for 72 hours, but in my case I see
>>> those nodes in UNREACHABLE for ever (more than 60 days)
>>
>>
>> To be more accurate,  you should never see leaving node as unreachable I
>> believe (not even for 72 hours). The 72 hours is the time Gossip should
>> continue referencing the old nodes. Typically when you remove the ghost
>> nodes, they should no longer appear in 'nodetool describe' cluster at all,
>>  I would say immediately, but still appear in 'nodetool gossipinfo' with a
>> 'left' or 'remove' status.
>>
>> I hope that helps and that one of the above will do the trick (I'd bet on
>> the assassinate :)). Also sorry it took us a while to answer you this
>> relatively common question :);
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - alain@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
>> jaibheemsen@gmail.com> a écrit :
>>
>>> Hello,
>>>
>>> I have a Cassandra cluster running with 2.1.16 version of Cassandra,
>>> where I have decommissioned few nodes from the cluster using "nodetool
>>> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
>>> describecluster" output. I believe  they appear only for 72 hours, but in
>>> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
>>> Rolling restart of the nodes didn't remove them. any idea what could be
>>> causing here?
>>>
>>> Note: I don't see them in the nodetool status output.
>>>
>>

Re: Decommissioned nodes are in UNREACHABLE state

Posted by Jai Bheemsen Rao Dhanwada <ja...@gmail.com>.
Hello Alain,

Thanks for your suggestions.

Surprisingly, the node which is in unreachable state, is not present in any
of the system tables. I am wondering, where the information is coming from.
I checked system.peers for the IP in UNREACHABLE state and it's not
present. I tried restart of Cassandra service as well.

On Thu, Jun 20, 2019 at 5:59 AM Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hello,
>
> Assuming you nodes are out for a while and you don't need the data after
> 60 days (or cannot get it anyway), the way to fix this is to force the node
> out. I would try, in this order:
>
> - nodetool removenode HOSTID
> - nodetool removenode force
>
> These 2 might really not work at this stage, but if they do, this is a
> clean way to do so.
> Now, to really push the ghost nodes to the exit door, it often takes:
>
> - nodetool assassinate
>
> I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
> details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html):
>
> echo "run -b org.apache.cassandra.net:type=Gossiper
>> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
>> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199
>
>
> This should really remove the traces of the node, without any safety, no
> streaming, no checks, just get rid of it. So to use with a lot of care and
> understanding. In your situation I guess this is what will work.
>
> As a last attempt, you could try removing traces of the dead node(s) from
> all the live nodes 'system.peers' table. This table is local to each node,
> so the DELETE command is to be send to all the nodes (that have a trace of
> an old node).
>
> - cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"
>
> but I see the node IPs in UNREACHABLE state in "nodetool describecluster"
>> output. I believe  they appear only for 72 hours, but in my case I see
>> those nodes in UNREACHABLE for ever (more than 60 days)
>
>
> To be more accurate,  you should never see leaving node as unreachable I
> believe (not even for 72 hours). The 72 hours is the time Gossip should
> continue referencing the old nodes. Typically when you remove the ghost
> nodes, they should no longer appear in 'nodetool describe' cluster at all,
>  I would say immediately, but still appear in 'nodetool gossipinfo' with a
> 'left' or 'remove' status.
>
> I hope that helps and that one of the above will do the trick (I'd bet on
> the assassinate :)). Also sorry it took us a while to answer you this
> relatively common question :);
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
> jaibheemsen@gmail.com> a écrit :
>
>> Hello,
>>
>> I have a Cassandra cluster running with 2.1.16 version of Cassandra,
>> where I have decommissioned few nodes from the cluster using "nodetool
>> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
>> describecluster" output. I believe  they appear only for 72 hours, but in
>> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
>> Rolling restart of the nodes didn't remove them. any idea what could be
>> causing here?
>>
>> Note: I don't see them in the nodetool status output.
>>
>

Re: Decommissioned nodes are in UNREACHABLE state

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello,

Assuming you nodes are out for a while and you don't need the data after 60
days (or cannot get it anyway), the way to fix this is to force the node
out. I would try, in this order:

- nodetool removenode HOSTID
- nodetool removenode force

These 2 might really not work at this stage, but if they do, this is a
clean way to do so.
Now, to really push the ghost nodes to the exit door, it often takes:

- nodetool assassinate

I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html):

echo "run -b org.apache.cassandra.net:type=Gossiper
> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199


This should really remove the traces of the node, without any safety, no
streaming, no checks, just get rid of it. So to use with a lot of care and
understanding. In your situation I guess this is what will work.

As a last attempt, you could try removing traces of the dead node(s) from
all the live nodes 'system.peers' table. This table is local to each node,
so the DELETE command is to be send to all the nodes (that have a trace of
an old node).

- cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"

but I see the node IPs in UNREACHABLE state in "nodetool describecluster"
> output. I believe  they appear only for 72 hours, but in my case I see
> those nodes in UNREACHABLE for ever (more than 60 days)


To be more accurate,  you should never see leaving node as unreachable I
believe (not even for 72 hours). The 72 hours is the time Gossip should
continue referencing the old nodes. Typically when you remove the ghost
nodes, they should no longer appear in 'nodetool describe' cluster at all,
 I would say immediately, but still appear in 'nodetool gossipinfo' with a
'left' or 'remove' status.

I hope that helps and that one of the above will do the trick (I'd bet on
the assassinate :)). Also sorry it took us a while to answer you this
relatively common question :);

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
jaibheemsen@gmail.com> a écrit :

> Hello,
>
> I have a Cassandra cluster running with 2.1.16 version of Cassandra, where
> I have decommissioned few nodes from the cluster using "nodetool
> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
> describecluster" output. I believe  they appear only for 72 hours, but in
> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
> Rolling restart of the nodes didn't remove them. any idea what could be
> causing here?
>
> Note: I don't see them in the nodetool status output.
>