You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Fd Habash <fm...@gmail.com> on 2019/03/14 15:42:41 UTC

RE: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

I can conclusively say, none of these commands were run. However, I think this is  the likely scenario …

If you have a cluster of three nodes 1,2,3 …
- If 3 shows as DN
- Restart C* on 1 & 2
- Nodetool status should NOT show node 3 IP at all.

Restarting the cluster while a node is down resets gossip state. 

There is a good chance this is what happened. 

Plausible? 

----------------
Thank you

From: Jeff Jirsa
Sent: Thursday, March 14, 2019 11:06 AM
To: cassandra
Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Two things that wouldn't be a bug:

You could have run removenode
You could have run assassinate

Also could be some new bug, but that's much less likely. 


On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message 
 
ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup
java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because it doesn't exist in gossip
        at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) ~[apache-cassandra-2.2.8.jar:2.2.8]
 
 
DN  10.xx.xx.xx  388.43 KB  256          6.9%              bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
 
Under what conditions does this happen?
 
 
----------------
Thank you

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Jeff Jirsa <jj...@gmail.com>.

On Thu, Mar 14, 2019 at 3:42 PM Fd Habash <fm...@gmail.com> wrote:

> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>    - If 3 shows as DN
>    - Restart C* on 1 & 2
>    - Nodetool status should NOT show node 3 IP at all.
>
>
If you do this, node3 definitely needs to still be present, and it should
still show DN. If it doesnt, ranges move, and consistency will be violated
(aka: really bad).


>
>
> Restarting the cluster while a node is down resets gossip state.
>

It resets some internal states, but not all of them. It may lose hosts that
have left, but it shouldn't lose any that are simply down.


>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
>         at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256          6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Sam Tunnicliffe <sa...@beobal.com>.

Do you have a cassandra-topology.properties file in place? If so, GPFS will instantiate a PropertyFileSnitch using that for compatibility mode. Then, when gossip state doesn’t contain any endpoint info about the down node (because you bounced the whole cluster), instead of reading the rack & dc from system.peers, it will fall back to the PFS. DC1:r1 is the default in the cassandra-topologies.properties in the distro.

> On 15 Mar 2019, at 12:04, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> Is this using GPFS?  If so, can you open a JIRA? It feels like potentially GPFS is not persisting the rack/DC info into system.peers and loses the DC on restart. This is somewhat understandable, but definitely deserves a JIRA. 
> 
> On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic <stefan.miklosovic@instaclustr.com <ma...@instaclustr.com>> wrote:
> Hi Fd,
> 
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3 reported node2 to be DN, then I killed node1 and node3 and I restarted them and node2 was reported like this:
> 
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
> DN  172.19.0.8  ?          256          64.0%             bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
> UN  172.19.0.5  382.75 KiB  256          64.4%             2a062140-2428-4092-b48b-7495d083d7f9  rack1
> UN  172.19.0.9  171.41 KiB  256          71.6%             9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3
> 
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it was part of the "Datacenter: dc1" output where both node1 and node3 were.
> 
> But after killing both node1 and node3 (so cluster was totally down), after restarting them, node2 was reported like that.
> 
> I do not know what is the difference here. Are gossiping data somewhere stored on the disk? I would say so, otherwise there is no way how could node1 / node3 report 
> that node2 is down but at the same time I dont get why it is "out of the list" where node1 and node3 are.
> 
> 
> On Fri, 15 Mar 2019 at 02:42, Fd Habash <fmhabash@gmail.com <ma...@gmail.com>> wrote:
> I can conclusively say, none of these commands were run. However, I think this is  the likely scenario …
> 
>  
> 
> If you have a cluster of three nodes 1,2,3 …
> 
> If 3 shows as DN
> Restart C* on 1 & 2
> Nodetool status should NOT show node 3 IP at all.
>  
> 
> Restarting the cluster while a node is down resets gossip state.
> 
>  
> 
> There is a good chance this is what happened.
> 
>  
> 
> Plausible?
> 
>  
> 
> ----------------
> Thank you
> 
>  
> 
> From: Jeff Jirsa <ma...@gmail.com>
> Sent: Thursday, March 14, 2019 11:06 AM
> To: cassandra <ma...@cassandra.apache.org>
> Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
> 
>  
> 
> Two things that wouldn't be a bug:
> 
>  
> 
> You could have run removenode
> 
> You could have run assassinate
> 
>  
> 
> Also could be some new bug, but that's much less likely. 
> 
>  
> 
>  
> 
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fmhabash@gmail.com <ma...@gmail.com>> wrote:
> 
> I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message
> 
>  
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup
> 
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because it doesn't exist in gossip
> 
>         at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) ~[apache-cassandra-2.2.8.jar:2.2.8]
> 
>  
>  
> DN  10.xx.xx.xx  388.43 KB  256          6.9%              bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
> 
>  
> Under what conditions does this happen?
> 
>  
>  
> ----------------
> Thank you
> 
>  
>  
> 
> 
> Stefan Miklosovic
>

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Jeff Jirsa <jj...@gmail.com>.

Is this using GPFS?  If so, can you open a JIRA? It feels like potentially
GPFS is not persisting the rack/DC info into system.peers and loses the DC
on restart. This is somewhat understandable, but definitely deserves a
JIRA.

On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Hi Fd,
>
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
> reported node2 to be DN, then I killed node1 and node3 and I restarted them
> and node2 was reported like this:
>
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                      Rack
> DN  172.19.0.8  ?          256          64.0%
>  bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                      Rack
> UN  172.19.0.5  382.75 KiB  256          64.4%
>  2a062140-2428-4092-b48b-7495d083d7f9  rack1
> UN  172.19.0.9  171.41 KiB  256          71.6%
>  9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3
>
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it
> was part of the "Datacenter: dc1" output where both node1 and node3 were.
>
> But after killing both node1 and node3 (so cluster was totally down),
> after restarting them, node2 was reported like that.
>
> I do not know what is the difference here. Are gossiping data somewhere
> stored on the disk? I would say so, otherwise there is no way how could
> node1 / node3 report
> that node2 is down but at the same time I dont get why it is "out of the
> list" where node1 and node3 are.
>
>
> On Fri, 15 Mar 2019 at 02:42, Fd Habash <fm...@gmail.com> wrote:
>
>> I can conclusively say, none of these commands were run. However, I think
>> this is  the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>>    - If 3 shows as DN
>>    - Restart C* on 1 & 2
>>    - Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa <jj...@gmail.com>
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra <us...@cassandra.apache.org>
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>>         at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN  10.xx.xx.xx  388.43 KB  256          6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>>
>>
>
> Stefan Miklosovic
>
>

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Hi Fd,

I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
reported node2 to be DN, then I killed node1 and node3 and I restarted them
and node2 was reported like this:

[root@spark-master-1 /]# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                     Rack
DN  172.19.0.8  ?          256          64.0%
 bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                     Rack
UN  172.19.0.5  382.75 KiB  256          64.4%
 2a062140-2428-4092-b48b-7495d083d7f9  rack1
UN  172.19.0.9  171.41 KiB  256          71.6%
 9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3

Prior to killing of node1 and node3, node2 was indeed marked as DN but it
was part of the "Datacenter: dc1" output where both node1 and node3 were.

But after killing both node1 and node3 (so cluster was totally down), after
restarting them, node2 was reported like that.

I do not know what is the difference here. Are gossiping data somewhere
stored on the disk? I would say so, otherwise there is no way how could
node1 / node3 report
that node2 is down but at the same time I dont get why it is "out of the
list" where node1 and node3 are.


On Fri, 15 Mar 2019 at 02:42, Fd Habash <fm...@gmail.com> wrote:

> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>    - If 3 shows as DN
>    - Restart C* on 1 & 2
>    - Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
>         at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256          6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>

Stefan Miklosovic

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Laxmikant Upadhyay <la...@gmail.com>.

Hi Habash,

The reason of  "Cannot replace_address /10.xx.xx.xxx.xx because it doesn't
exist in gossip  " error during replace is that the dead node gossip
information could not survive when you did full cluster (rest of the nodes)
restart.

I faced this issue before, you can check my experience of resolving this
issue in below link
https://github.com/laxmikant99/cassandra-single-node-disater-recovery-lessons

regards,
Laxmikant



On Fri, Mar 15, 2019 at 5:21 AM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> It is just a C* in Docker Compose with static IP addresses as long as all
> containers run. I am just killing Cassandra process and starting it again
> in each container.
>
> On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> Are your IPs changing as you restart the cluster? Kubernetes or Mesos or
>> something where your data gets scheduled on different machines? If so, if
>> it gets an IP that was previously in the cluster, it’ll stomp on the old
>> entry in the gossiper maps
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Mar 14, 2019, at 3:42 PM, Fd Habash <fm...@gmail.com> wrote:
>>
>> I can conclusively say, none of these commands were run. However, I think
>> this is  the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>>    - If 3 shows as DN
>>    - Restart C* on 1 & 2
>>    - Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa <jj...@gmail.com>
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra <us...@cassandra.apache.org>
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>>         at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN  10.xx.xx.xx  388.43 KB  256          6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>>
>>
>>
>
> --
>
>
> *Stefan Miklosovic**Senior Software Engineer*
>
>
> M: +61459911436
>
> <https://www.instaclustr.com>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy
>


-- 

regards,
Laxmikant Upadhyay

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Stefan Miklosovic <st...@instaclustr.com>.

It is just a C* in Docker Compose with static IP addresses as long as all
containers run. I am just killing Cassandra process and starting it again
in each container.

On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa <jj...@gmail.com> wrote:

> Are your IPs changing as you restart the cluster? Kubernetes or Mesos or
> something where your data gets scheduled on different machines? If so, if
> it gets an IP that was previously in the cluster, it’ll stomp on the old
> entry in the gossiper maps
>
>
>
> --
> Jeff Jirsa
>
>
> On Mar 14, 2019, at 3:42 PM, Fd Habash <fm...@gmail.com> wrote:
>
> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>    - If 3 shows as DN
>    - Restart C* on 1 & 2
>    - Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
>         at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256          6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>
>

-- 


*Stefan Miklosovic**Senior Software Engineer*


M: +61459911436

<https://www.instaclustr.com>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Posted by Jeff Jirsa <jj...@gmail.com>.

Are your IPs changing as you restart the cluster? Kubernetes or Mesos or something where your data gets scheduled on different machines? If so, if it gets an IP that was previously in the cluster, it’ll stomp on the old entry in the gossiper maps



-- 
Jeff Jirsa


> On Mar 14, 2019, at 3:42 PM, Fd Habash <fm...@gmail.com> wrote:
> 
> I can conclusively say, none of these commands were run. However, I think this is  the likely scenario …
>  
> If you have a cluster of three nodes 1,2,3 …
> If 3 shows as DN
> Restart C* on 1 & 2
> Nodetool status should NOT show node 3 IP at all.
>  
> Restarting the cluster while a node is down resets gossip state.
>  
> There is a good chance this is what happened.
>  
> Plausible?
>  
> ----------------
> Thank you
>  
> From: Jeff Jirsa
> Sent: Thursday, March 14, 2019 11:06 AM
> To: cassandra
> Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
>  
> Two things that wouldn't be a bug:
>  
> You could have run removenode
> You could have run assassinate
>  
> Also could be some new bug, but that's much less likely. 
>  
>  
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fm...@gmail.com> wrote:
> I have a node which I know for certain was a cluster member last week. It showed in nodetool status as DN. When I attempted to replace it today, I got this message
>  
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception encountered during startup
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because it doesn't exist in gossip
>         at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) ~[apache-cassandra-2.2.8.jar:2.2.8]
>  
>  
> DN  10.xx.xx.xx  388.43 KB  256          6.9%              bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>  
> Under what conditions does this happen?
>  
>  
> ----------------
> Thank you
>  
>