You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vincent Rischmann <vi...@rischmann.fr> on 2019/08/28 14:33:24 UTC
gossipinfo contains two nodes dead for more than two years
Hi,
while replacing a node in a cluster I saw this log:
2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 is now DOWN
it caught my attention because that ip address doesn't exist anymore in the cluster and it hasn't for a long time.
After some reading I ran `nodetool gossipinfo` and I saw these entries which are nodes that don't exist anymore:
/10.15.53.27
generation:1503480618
heartbeat:26970
STATUS:2:hibernate,true
LOAD:26810:6.17363354147E11
SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
DC:10:DC1
RACK:12:RAC1
RELEASE_VERSION:6:2.1.18
INTERNAL_IP:8:10.15.53.27
RPC_ADDRESS:5:10.15.53.27
SEVERITY:26972:0.0
NET_VERSION:3:8
HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
TOKENS:1:<hidden>
/10.5.1.16
generation:1503636779
heartbeat:324
STATUS:2:hibernate,true
LOAD:204:2.601990697532E12
SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
DC:10:DC1
RACK:12:RAC1
RELEASE_VERSION:6:2.1.18
INTERNAL_IP:8:10.5.1.16
RPC_ADDRESS:5:10.5.1.16
SEVERITY:326:0.0
NET_VERSION:3:8
HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
TOKENS:1:<hidden>
the generations are:
- Wed, 23 Aug 2017 09:30:18 GMT
- Fri, 25 Aug 2017 04:52:59 GMT
I don't remember what we did at that time but it looks like we botched something while joining a node or something.
After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html I'm thinking of doing the following:
* nodetool removenode 10.15.53.27
* if it doesn't work for some reason: nodetool assassinate 10.15.53.27
Since those nodes have been long dead and don't appear in system.peer I don't anticipate any problems but I'd like some confirmation that this can't break my cluster.
Thanks !
Re: gossipinfo contains two nodes dead for more than two years
Posted by John Sumsion <Su...@familysearch.org>.
I've seen something similar if there is a node still referring to that IP as a seed node in cassandra.yaml. You might want to check that.
________________________________
From: Vincent Rischmann <vi...@rischmann.fr>
Sent: Wednesday, August 28, 2019 10:10 AM
To: user@cassandra.apache.org <us...@cassandra.apache.org>
Subject: Re: gossipinfo contains two nodes dead for more than two years
Yep, they're not visible in both ring and status.
On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote:
Based on what you've posted, I assume the instances are not visible in `nodetool ring` or `nodetool status`, and the only reason you know they're still in gossipinfo is you see them in the logs? If that's the case, then yes, I would do `nodetool assassinate`.
On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann <vi...@rischmann.fr>> wrote:
Hi,
while replacing a node in a cluster I saw this log:
2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A&e=> is now DOWN
it caught my attention because that ip address doesn't exist anymore in the cluster and it hasn't for a long time.
After some reading I ran `nodetool gossipinfo` and I saw these entries which are nodes that don't exist anymore:
/10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A&e=>
generation:1503480618
heartbeat:26970
STATUS:2:hibernate,true
LOAD:26810:6.17363354147E11
SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
DC:10:DC1
RACK:12:RAC1
RELEASE_VERSION:6:2.1.18
INTERNAL_IP:8:10.15.53.27
RPC_ADDRESS:5:10.15.53.27
SEVERITY:26972:0.0
NET_VERSION:3:8
HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
TOKENS:1:<hidden>
/10.5.1.16<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.5.1.16&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=rb7LNU-vuRE1cs3Nzup8H-mjsgVNkaE5SgQYtCM5amA&e=>
generation:1503636779
heartbeat:324
STATUS:2:hibernate,true
LOAD:204:2.601990697532E12
SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
DC:10:DC1
RACK:12:RAC1
RELEASE_VERSION:6:2.1.18
INTERNAL_IP:8:10.5.1.16
RPC_ADDRESS:5:10.5.1.16
SEVERITY:326:0.0
NET_VERSION:3:8
HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
TOKENS:1:<hidden>
the generations are:
- Wed, 23 Aug 2017 09:30:18 GMT
- Fri, 25 Aug 2017 04:52:59 GMT
I don't remember what we did at that time but it looks like we botched something while joining a node or something.
After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2018_09_18_assassinate.html&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=nq2MU2bQmBvRn14-ALr4SpzhmqeeYYGXCOye1zjnQJw&e=> I'm thinking of doing the following:
* nodetool removenode 10.15.53.27
* if it doesn't work for some reason: nodetool assassinate 10.15.53.27
Since those nodes have been long dead and don't appear in system.peer I don't anticipate any problems but I'd like some confirmation that this can't break my cluster.
Thanks !
Re: gossipinfo contains two nodes dead for more than two years
Posted by Vincent Rischmann <vi...@rischmann.fr>.
Yep, they're not visible in both ring and status.
On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote:
> Based on what you've posted, I assume the instances are not visible in `nodetool ring` or `nodetool status`, and the only reason you know they're still in gossipinfo is you see them in the logs? If that's the case, then yes, I would do `nodetool assassinate`.
>
>
>
> On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann <vi...@rischmann.fr> wrote:
>> __
>> Hi,
>>
>> while replacing a node in a cluster I saw this log:
>>
>> 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 is now DOWN
>>
>> it caught my attention because that ip address doesn't exist anymore in the cluster and it hasn't for a long time.
>>
>> After some reading I ran `nodetool gossipinfo` and I saw these entries which are nodes that don't exist anymore:
>>
>> /10.15.53.27
>> generation:1503480618
>> heartbeat:26970
>> STATUS:2:hibernate,true
>> LOAD:26810:6.17363354147E11
>> SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
>> DC:10:DC1
>> RACK:12:RAC1
>> RELEASE_VERSION:6:2.1.18
>> INTERNAL_IP:8:10.15.53.27
>> RPC_ADDRESS:5:10.15.53.27
>> SEVERITY:26972:0.0
>> NET_VERSION:3:8
>> HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
>> TOKENS:1:<hidden>
>> /10.5.1.16
>> generation:1503636779
>> heartbeat:324
>> STATUS:2:hibernate,true
>> LOAD:204:2.601990697532E12
>> SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
>> DC:10:DC1
>> RACK:12:RAC1
>> RELEASE_VERSION:6:2.1.18
>> INTERNAL_IP:8:10.5.1.16
>> RPC_ADDRESS:5:10.5.1.16
>> SEVERITY:326:0.0
>> NET_VERSION:3:8
>> HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
>> TOKENS:1:<hidden>
>>
>> the generations are:
>>
>> - Wed, 23 Aug 2017 09:30:18 GMT
>> - Fri, 25 Aug 2017 04:52:59 GMT
>>
>> I don't remember what we did at that time but it looks like we botched something while joining a node or something.
>>
>> After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html I'm thinking of doing the following:
>>
>> * nodetool removenode 10.15.53.27
>> * if it doesn't work for some reason: nodetool assassinate 10.15.53.27
>>
>> Since those nodes have been long dead and don't appear in system.peer I don't anticipate any problems but I'd like some confirmation that this can't break my cluster.
>>
>> Thanks !
Re: gossipinfo contains two nodes dead for more than two years
Posted by Jeff Jirsa <jj...@gmail.com>.
Based on what you've posted, I assume the instances are not visible in
`nodetool ring` or `nodetool status`, and the only reason you know they're
still in gossipinfo is you see them in the logs? If that's the case, then
yes, I would do `nodetool assassinate`.
On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann <vi...@rischmann.fr>
wrote:
> Hi,
>
> while replacing a node in a cluster I saw this log:
>
> 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27
> is now DOWN
>
> it caught my attention because that ip address doesn't exist anymore in
> the cluster and it hasn't for a long time.
>
> After some reading I ran `nodetool gossipinfo` and I saw these entries
> which are nodes that don't exist anymore:
>
> /10.15.53.27
> generation:1503480618
> heartbeat:26970
> STATUS:2:hibernate,true
> LOAD:26810:6.17363354147E11
> SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
> DC:10:DC1
> RACK:12:RAC1
> RELEASE_VERSION:6:2.1.18
> INTERNAL_IP:8:10.15.53.27
> RPC_ADDRESS:5:10.15.53.27
> SEVERITY:26972:0.0
> NET_VERSION:3:8
> HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
> TOKENS:1:<hidden>
> /10.5.1.16
> generation:1503636779
> heartbeat:324
> STATUS:2:hibernate,true
> LOAD:204:2.601990697532E12
> SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
> DC:10:DC1
> RACK:12:RAC1
> RELEASE_VERSION:6:2.1.18
> INTERNAL_IP:8:10.5.1.16
> RPC_ADDRESS:5:10.5.1.16
> SEVERITY:326:0.0
> NET_VERSION:3:8
> HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
> TOKENS:1:<hidden>
>
> the generations are:
>
> - Wed, 23 Aug 2017 09:30:18 GMT
> - Fri, 25 Aug 2017 04:52:59 GMT
>
> I don't remember what we did at that time but it looks like we botched
> something while joining a node or something.
>
> After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html
> I'm thinking of doing the following:
>
> * nodetool removenode 10.15.53.27
> * if it doesn't work for some reason: nodetool assassinate 10.15.53.27
>
> Since those nodes have been long dead and don't appear in system.peer I
> don't anticipate any problems but I'd like some confirmation that this
> can't break my cluster.
>
> Thanks !
>