You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fd Habash <fm...@gmail.com> on 2019/03/14 15:42:41 UTC
RE: Cannot replace_address /10.xx.xx.xx because it doesn't exist
ingossip
I can conclusively say, none of these commands were run. However, I think this is the likely scenario …
If you have a cluster of three nodes 1,2,3 …
- If 3 shows as DN
- Restart C* on 1 & 2
- Nodetool status should NOT show node 3 IP at all.
Restarting the cluster while a node is down resets gossip state.
There is a good chance this is what happened.
Plausible?
----------------
Thank you
From: Jeff Jirsa
Sent: Thursday, March 14, 2019 11:06 AM
To: cassandra
Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Two things that wouldn't be a bug:
You could have run removenode
You could have run assassinate
Also could be some new bug, but that's much less likely.
On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message
ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup
java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because it doesn't exist in gossip
at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) ~[apache-cassandra-2.2.8.jar:2.2.8]
DN 10.xx.xx.xx 388.43 KB 256 6.9% bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
Under what conditions does this happen?
----------------
Thank you
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Posted by Jeff Jirsa <jj...@gmail.com>.
On Thu, Mar 14, 2019 at 3:42 PM Fd Habash <fm...@gmail.com> wrote:
> I can conclusively say, none of these commands were run. However, I think
> this is the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
> - If 3 shows as DN
> - Restart C* on 1 & 2
> - Nodetool status should NOT show node 3 IP at all.
>
>
If you do this, node3 definitely needs to still be present, and it should
still show DN. If it doesnt, ranges move, and consistency will be violated
(aka: really bad).
>
>
> Restarting the cluster while a node is down resets gossip state.
>
It resets some internal states, but not all of them. It may lose hosts that
have left, but it shouldn't lose any that are simply down.
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
> at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN 10.xx.xx.xx 388.43 KB 256 6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist
ingossip
Posted by Sam Tunnicliffe <sa...@beobal.com>.
Do you have a cassandra-topology.properties file in place? If so, GPFS will instantiate a PropertyFileSnitch using that for compatibility mode. Then, when gossip state doesn’t contain any endpoint info about the down node (because you bounced the whole cluster), instead of reading the rack & dc from system.peers, it will fall back to the PFS. DC1:r1 is the default in the cassandra-topologies.properties in the distro.
> On 15 Mar 2019, at 12:04, Jeff Jirsa <jj...@gmail.com> wrote:
>
> Is this using GPFS? If so, can you open a JIRA? It feels like potentially GPFS is not persisting the rack/DC info into system.peers and loses the DC on restart. This is somewhat understandable, but definitely deserves a JIRA.
>
> On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic <stefan.miklosovic@instaclustr.com <ma...@instaclustr.com>> wrote:
> Hi Fd,
>
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3 reported node2 to be DN, then I killed node1 and node3 and I restarted them and node2 was reported like this:
>
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> DN 172.19.0.8 ? 256 64.0% bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab r1
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN 172.19.0.5 382.75 KiB 256 64.4% 2a062140-2428-4092-b48b-7495d083d7f9 rack1
> UN 172.19.0.9 171.41 KiB 256 71.6% 9590b791-ad53-4b5a-b4c7-b00408ed02dd rack3
>
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it was part of the "Datacenter: dc1" output where both node1 and node3 were.
>
> But after killing both node1 and node3 (so cluster was totally down), after restarting them, node2 was reported like that.
>
> I do not know what is the difference here. Are gossiping data somewhere stored on the disk? I would say so, otherwise there is no way how could node1 / node3 report
> that node2 is down but at the same time I dont get why it is "out of the list" where node1 and node3 are.
>
>
> On Fri, 15 Mar 2019 at 02:42, Fd Habash <fmhabash@gmail.com <ma...@gmail.com>> wrote:
> I can conclusively say, none of these commands were run. However, I think this is the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
> If 3 shows as DN
> Restart C* on 1 & 2
> Nodetool status should NOT show node 3 IP at all.
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> From: Jeff Jirsa <ma...@gmail.com>
> Sent: Thursday, March 14, 2019 11:06 AM
> To: cassandra <ma...@cassandra.apache.org>
> Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fmhabash@gmail.com <ma...@gmail.com>> wrote:
>
> I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because it doesn't exist in gossip
>
> at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
> DN 10.xx.xx.xx 388.43 KB 256 6.9% bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>
>
> Under what conditions does this happen?
>
>
>
> ----------------
> Thank you
>
>
>
>
>
> Stefan Miklosovic
>
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Posted by Jeff Jirsa <jj...@gmail.com>.
Is this using GPFS? If so, can you open a JIRA? It feels like potentially
GPFS is not persisting the rack/DC info into system.peers and loses the DC
on restart. This is somewhat understandable, but definitely deserves a
JIRA.
On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:
> Hi Fd,
>
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
> reported node2 to be DN, then I killed node1 and node3 and I restarted them
> and node2 was reported like this:
>
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> DN 172.19.0.8 ? 256 64.0%
> bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab r1
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 172.19.0.5 382.75 KiB 256 64.4%
> 2a062140-2428-4092-b48b-7495d083d7f9 rack1
> UN 172.19.0.9 171.41 KiB 256 71.6%
> 9590b791-ad53-4b5a-b4c7-b00408ed02dd rack3
>
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it
> was part of the "Datacenter: dc1" output where both node1 and node3 were.
>
> But after killing both node1 and node3 (so cluster was totally down),
> after restarting them, node2 was reported like that.
>
> I do not know what is the difference here. Are gossiping data somewhere
> stored on the disk? I would say so, otherwise there is no way how could
> node1 / node3 report
> that node2 is down but at the same time I dont get why it is "out of the
> list" where node1 and node3 are.
>
>
> On Fri, 15 Mar 2019 at 02:42, Fd Habash <fm...@gmail.com> wrote:
>
>> I can conclusively say, none of these commands were run. However, I think
>> this is the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>> - If 3 shows as DN
>> - Restart C* on 1 & 2
>> - Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa <jj...@gmail.com>
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra <us...@cassandra.apache.org>
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>> at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN 10.xx.xx.xx 388.43 KB 256 6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>>
>>
>
> Stefan Miklosovic
>
>
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Posted by Stefan Miklosovic <st...@instaclustr.com>.
Hi Fd,
I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
reported node2 to be DN, then I killed node1 and node3 and I restarted them
and node2 was reported like this:
[root@spark-master-1 /]# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
DN 172.19.0.8 ? 256 64.0%
bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 172.19.0.5 382.75 KiB 256 64.4%
2a062140-2428-4092-b48b-7495d083d7f9 rack1
UN 172.19.0.9 171.41 KiB 256 71.6%
9590b791-ad53-4b5a-b4c7-b00408ed02dd rack3
Prior to killing of node1 and node3, node2 was indeed marked as DN but it
was part of the "Datacenter: dc1" output where both node1 and node3 were.
But after killing both node1 and node3 (so cluster was totally down), after
restarting them, node2 was reported like that.
I do not know what is the difference here. Are gossiping data somewhere
stored on the disk? I would say so, otherwise there is no way how could
node1 / node3 report
that node2 is down but at the same time I dont get why it is "out of the
list" where node1 and node3 are.
On Fri, 15 Mar 2019 at 02:42, Fd Habash <fm...@gmail.com> wrote:
> I can conclusively say, none of these commands were run. However, I think
> this is the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
> - If 3 shows as DN
> - Restart C* on 1 & 2
> - Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
> at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN 10.xx.xx.xx 388.43 KB 256 6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>
Stefan Miklosovic
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Posted by Laxmikant Upadhyay <la...@gmail.com>.
Hi Habash,
The reason of "Cannot replace_address /10.xx.xx.xxx.xx because it doesn't
exist in gossip " error during replace is that the dead node gossip
information could not survive when you did full cluster (rest of the nodes)
restart.
I faced this issue before, you can check my experience of resolving this
issue in below link
https://github.com/laxmikant99/cassandra-single-node-disater-recovery-lessons
regards,
Laxmikant
On Fri, Mar 15, 2019 at 5:21 AM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:
> It is just a C* in Docker Compose with static IP addresses as long as all
> containers run. I am just killing Cassandra process and starting it again
> in each container.
>
> On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> Are your IPs changing as you restart the cluster? Kubernetes or Mesos or
>> something where your data gets scheduled on different machines? If so, if
>> it gets an IP that was previously in the cluster, it’ll stomp on the old
>> entry in the gossiper maps
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Mar 14, 2019, at 3:42 PM, Fd Habash <fm...@gmail.com> wrote:
>>
>> I can conclusively say, none of these commands were run. However, I think
>> this is the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>> - If 3 shows as DN
>> - Restart C* on 1 & 2
>> - Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa <jj...@gmail.com>
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra <us...@cassandra.apache.org>
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>> at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN 10.xx.xx.xx 388.43 KB 256 6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>>
>>
>>
>
> --
>
>
> *Stefan Miklosovic**Senior Software Engineer*
>
>
> M: +61459911436
>
> <https://www.instaclustr.com>
>
> <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information. If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy
>
--
regards,
Laxmikant Upadhyay
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Posted by Stefan Miklosovic <st...@instaclustr.com>.
It is just a C* in Docker Compose with static IP addresses as long as all
containers run. I am just killing Cassandra process and starting it again
in each container.
On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa <jj...@gmail.com> wrote:
> Are your IPs changing as you restart the cluster? Kubernetes or Mesos or
> something where your data gets scheduled on different machines? If so, if
> it gets an IP that was previously in the cluster, it’ll stomp on the old
> entry in the gossiper maps
>
>
>
> --
> Jeff Jirsa
>
>
> On Mar 14, 2019, at 3:42 PM, Fd Habash <fm...@gmail.com> wrote:
>
> I can conclusively say, none of these commands were run. However, I think
> this is the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
> - If 3 shows as DN
> - Restart C* on 1 & 2
> - Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
> at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN 10.xx.xx.xx 388.43 KB 256 6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>
>
--
*Stefan Miklosovic**Senior Software Engineer*
M: +61459911436
<https://www.instaclustr.com>
<https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>
Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.
This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).
This email and any attachments may contain confidential and legally
privileged information. If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.
Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Posted by Jeff Jirsa <jj...@gmail.com>.
Are your IPs changing as you restart the cluster? Kubernetes or Mesos or something where your data gets scheduled on different machines? If so, if it gets an IP that was previously in the cluster, it’ll stomp on the old entry in the gossiper maps
--
Jeff Jirsa
> On Mar 14, 2019, at 3:42 PM, Fd Habash <fm...@gmail.com> wrote:
>
> I can conclusively say, none of these commands were run. However, I think this is the likely scenario …
>
> If you have a cluster of three nodes 1,2,3 …
> If 3 shows as DN
> Restart C* on 1 & 2
> Nodetool status should NOT show node 3 IP at all.
>
> Restarting the cluster while a node is down resets gossip state.
>
> There is a good chance this is what happened.
>
> Plausible?
>
> ----------------
> Thank you
>
> From: Jeff Jirsa
> Sent: Thursday, March 14, 2019 11:06 AM
> To: cassandra
> Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
>
> Two things that wouldn't be a bug:
>
> You could have run removenode
> You could have run assassinate
>
> Also could be some new bug, but that's much less likely.
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
> I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because it doesn't exist in gossip
> at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
> DN 10.xx.xx.xx 388.43 KB 256 6.9% bdbd632a-bf5d-44d4-b220-f17f258c4701 1e
>
> Under what conditions does this happen?
>
>
> ----------------
> Thank you
>
>