You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alan Gano <AG...@tsys.com> on 2019/06/12 13:54:55 UTC

Recover lost node from backup or evict/re-add?

If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs?

Or is the best way to do an evict/re-add?


Thanks,

Alan.

NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

Re: Recover lost node from backup or evict/re-add?

Posted by Jon Haddad <jo...@jonhaddad.com>.

100% agree with Sean.  I would only use Cassandra backups in a case where
you need to restore from full cluster loss.  Example: An entire DC burns
down, tornado, flooding.

Your routine node replacement after a failure should be
replace_address_first_boot.

To ensure this goes smoothly, run regular repairs.  We (The Last Pickle)
maintain this to make it easy: http://cassandra-reaper.io/

Jon


On Wed, Jun 12, 2019 at 11:17 AM Durity, Sean R <SE...@homedepot.com>
wrote:

> I’m not sure it is correct to say, “you cannot.” However, that is a more
> complicated restore and more likely to lead to inconsistent data and take
> longer to do. You are basically trying to start from a backup point and
> roll everything forward and catch up to current.
>
>
>
> Replacing/re-streaming is the well-trodden path. You are getting the net
> result of all that has happened since the node failure. And the node is not
> returning data to the clients while the bootstrap is running. If you have a
> restored/repairing node, it will accept client (and coordinator)
> connections even though it isn’t (guaranteed) consistent, yet.
>
>
>
> As I understand it – a full cluster recovery from backup still requires
> repair across the cluster to ensure consistency. In my experience, most
> apps cannot wait for a full restore/repair. Availability matters more. They
> also don’t want to pay for even more disk to hold some level of backups.
>
>
>
> There are some companies that provide finer-grained backup and recovery
> options, though.
>
>
>
> Sean Durity
>
>
>
> *From:* Alan Gano <AG...@tsys.com>
> *Sent:* Wednesday, June 12, 2019 1:43 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Recover lost node from backup or evict/re-add?
>
>
>
>
>
> Is it correct to say that a lost node cannot be restored from backup?  You
> must either replace the node or evict/re-add (i.e., rebuild from other
> nodes).
>
>
>
> Also, that snapshot, incremental, commitlog backups are relegated to
> application keyspace recovery only?
>
>
>
>
>
> How about recovery of the entire cluster? (rolling it back).  Are
> snapshots exact enough, in time, to not have a nodes that differ, in
> point-in-time, from the rest of the cluster?  Would those nodes be
> recoverable (nodetool repair?) … which brings me back to recovering a lost
> node from backup (restore last snapshot, and run nodetool repair?).
>
>
>
>
>
> Thanks,
>
>
>
> Alan Gano
>
>
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com <jj...@gmail.com>]
> *Sent:* Wednesday, June 12, 2019 10:14 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Recover lost node from backup or evict/re-add?
>
>
>
> A host can replace itself using the method I described
>
>
> On Jun 12, 2019, at 7:10 AM, Alan Gano <AG...@tsys.com> wrote:
>
> I guess I’m considering this scenario:
>
>    - host and configuration have survived
>    - /data is gone
>    - /backups have survived
>
>
>
> I have tested recovering from this scenario with an evict/re-add, which
> worked fine.
>
>
>
> If I restore from backup, the node will be behind the cluster – errrr,
> does it get caught up after a restore and start it up?
>
>
>
> Alan
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com <jj...@gmail.com>]
> *Sent:* Wednesday, June 12, 2019 10:02 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Recover lost node from backup or evict/re-add?
>
>
>
> To avoid violating consistency guarantees, you have to repair the replicas
> while the lost node is down
>
>
>
> Once you do that it’s typically easiest to bootstrap a replacement
> (there’s a property named “replace address first boot” you can google or
> someone can link) that tells a new joining host to take over for a failed
> machine.
>
>
>
>
> On Jun 12, 2019, at 6:54 AM, Alan Gano <AG...@tsys.com> wrote:
>
>
>
> If I lose a node, does it make sense to even restore from
> snapshot/incrementals/commitlogs?
>
>
>
> Or is the best way to do an evict/re-add?
>
>
>
>
>
> Thanks,
>
>
>
> Alan.
>
>
>
> NOTICE: This communication is intended only for the person or entity to
> whom it is addressed and may contain confidential, proprietary, and/or
> privileged material. Unless you are the intended addressee, any review,
> reliance, dissemination, distribution, copying or use whatsoever of this
> communication is strictly prohibited. If you received this in error, please
> reply immediately and delete the material from all computers. Email sent
> through the Internet is not secure. Do not use email to send us
> confidential information such as credit card numbers, PIN numbers,
> passwords, Social Security Numbers, Account numbers, or other important and
> confidential information.
>
> NOTICE: This communication is intended only for the person or entity to
> whom it is addressed and may contain confidential, proprietary, and/or
> privileged material. Unless you are the intended addressee, any review,
> reliance, dissemination, distribution, copying or use whatsoever of this
> communication is strictly prohibited. If you received this in error, please
> reply immediately and delete the material from all computers. Email sent
> through the Internet is not secure. Do not use email to send us
> confidential information such as credit card numbers, PIN numbers,
> passwords, Social Security Numbers, Account numbers, or other important and
> confidential information.
>
> NOTICE: This communication is intended only for the person or entity to
> whom it is addressed and may contain confidential, proprietary, and/or
> privileged material. Unless you are the intended addressee, any review,
> reliance, dissemination, distribution, copying or use whatsoever of this
> communication is strictly prohibited. If you received this in error, please
> reply immediately and delete the material from all computers. Email sent
> through the Internet is not secure. Do not use email to send us
> confidential information such as credit card numbers, PIN numbers,
> passwords, Social Security Numbers, Account numbers, or other important and
> confidential information.
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: Recover lost node from backup or evict/re-add?

Posted by "Durity, Sean R" <SE...@homedepot.com>.

I’m not sure it is correct to say, “you cannot.” However, that is a more complicated restore and more likely to lead to inconsistent data and take longer to do. You are basically trying to start from a backup point and roll everything forward and catch up to current.

Replacing/re-streaming is the well-trodden path. You are getting the net result of all that has happened since the node failure. And the node is not returning data to the clients while the bootstrap is running. If you have a restored/repairing node, it will accept client (and coordinator) connections even though it isn’t (guaranteed) consistent, yet.

As I understand it – a full cluster recovery from backup still requires repair across the cluster to ensure consistency. In my experience, most apps cannot wait for a full restore/repair. Availability matters more. They also don’t want to pay for even more disk to hold some level of backups.

There are some companies that provide finer-grained backup and recovery options, though.

Sean Durity

From: Alan Gano <AG...@tsys.com>
Sent: Wednesday, June 12, 2019 1:43 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Recover lost node from backup or evict/re-add?


Is it correct to say that a lost node cannot be restored from backup?  You must either replace the node or evict/re-add (i.e., rebuild from other nodes).

Also, that snapshot, incremental, commitlog backups are relegated to application keyspace recovery only?


How about recovery of the entire cluster? (rolling it back).  Are snapshots exact enough, in time, to not have a nodes that differ, in point-in-time, from the rest of the cluster?  Would those nodes be recoverable (nodetool repair?) … which brings me back to recovering a lost node from backup (restore last snapshot, and run nodetool repair?).


Thanks,

Alan Gano


From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Wednesday, June 12, 2019 10:14 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Recover lost node from backup or evict/re-add?

A host can replace itself using the method I described

On Jun 12, 2019, at 7:10 AM, Alan Gano <AG...@tsys.com>> wrote:
I guess I’m considering this scenario:

  *   host and configuration have survived
  *   /data is gone
  *   /backups have survived

I have tested recovering from this scenario with an evict/re-add, which worked fine.

If I restore from backup, the node will be behind the cluster – errrr, does it get caught up after a restore and start it up?

Alan

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Wednesday, June 12, 2019 10:02 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Recover lost node from backup or evict/re-add?

To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down

Once you do that it’s typically easiest to bootstrap a replacement (there’s a property named “replace address first boot” you can google or someone can link) that tells a new joining host to take over for a failed machine.


On Jun 12, 2019, at 6:54 AM, Alan Gano <AG...@tsys.com>> wrote:

If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs?

Or is the best way to do an evict/re-add?


Thanks,

Alan.

NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.
NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.
NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

RE: Recover lost node from backup or evict/re-add?

Posted by Alan Gano <AG...@tsys.com>.

Is it correct to say that a lost node cannot be restored from backup?  You must either replace the node or evict/re-add (i.e., rebuild from other nodes).

Also, that snapshot, incremental, commitlog backups are relegated to application keyspace recovery only?


How about recovery of the entire cluster? (rolling it back).  Are snapshots exact enough, in time, to not have a nodes that differ, in point-in-time, from the rest of the cluster?  Would those nodes be recoverable (nodetool repair?) … which brings me back to recovering a lost node from backup (restore last snapshot, and run nodetool repair?).


Thanks,

Alan Gano


From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Wednesday, June 12, 2019 10:14 AM
To: user@cassandra.apache.org
Subject: Re: Recover lost node from backup or evict/re-add?

A host can replace itself using the method I described

On Jun 12, 2019, at 7:10 AM, Alan Gano <AG...@tsys.com>> wrote:
I guess I’m considering this scenario:

·         host and configuration have survived

·         /data is gone

·         /backups have survived

I have tested recovering from this scenario with an evict/re-add, which worked fine.

If I restore from backup, the node will be behind the cluster – errrr, does it get caught up after a restore and start it up?

Alan

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Wednesday, June 12, 2019 10:02 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Recover lost node from backup or evict/re-add?

To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down

Once you do that it’s typically easiest to bootstrap a replacement (there’s a property named “replace address first boot” you can google or someone can link) that tells a new joining host to take over for a failed machine.


On Jun 12, 2019, at 6:54 AM, Alan Gano <AG...@tsys.com>> wrote:

If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs?

Or is the best way to do an evict/re-add?


Thanks,

Alan.

NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.
NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

Re: Recover lost node from backup or evict/re-add?

Posted by Jeff Jirsa <jj...@gmail.com>.

A host can replace itself using the method I described 

> On Jun 12, 2019, at 7:10 AM, Alan Gano <AG...@tsys.com> wrote:
> 
> I guess I’m considering this scenario:
> ·         host and configuration have survived
> ·         /data is gone
> ·         /backups have survived
>  
> I have tested recovering from this scenario with an evict/re-add, which worked fine.
>  
> If I restore from backup, the node will be behind the cluster – errrr, does it get caught up after a restore and start it up?
>  
> Alan
>  
> From: Jeff Jirsa [mailto:jjirsa@gmail.com] 
> Sent: Wednesday, June 12, 2019 10:02 AM
> To: user@cassandra.apache.org
> Subject: Re: Recover lost node from backup or evict/re-add?
>  
> To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down
>  
> Once you do that it’s typically easiest to bootstrap a replacement (there’s a property named “replace address first boot” you can google or someone can link) that tells a new joining host to take over for a failed machine.
>  
> 
> On Jun 12, 2019, at 6:54 AM, Alan Gano <AG...@tsys.com> wrote:
> 
>  
> If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs?
>  
> Or is the best way to do an evict/re-add?
>  
>  
> Thanks,
>  
> Alan.
>  
> NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.
> NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

RE: Recover lost node from backup or evict/re-add?

Posted by Alan Gano <AG...@tsys.com>.

I guess I’m considering this scenario:

·         host and configuration have survived

·         /data is gone

·         /backups have survived

I have tested recovering from this scenario with an evict/re-add, which worked fine.

If I restore from backup, the node will be behind the cluster – errrr, does it get caught up after a restore and start it up?

Alan

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Wednesday, June 12, 2019 10:02 AM
To: user@cassandra.apache.org
Subject: Re: Recover lost node from backup or evict/re-add?

To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down

Once you do that it’s typically easiest to bootstrap a replacement (there’s a property named “replace address first boot” you can google or someone can link) that tells a new joining host to take over for a failed machine.


On Jun 12, 2019, at 6:54 AM, Alan Gano <AG...@tsys.com>> wrote:

If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs?

Or is the best way to do an evict/re-add?


Thanks,

Alan.

NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.

Re: Recover lost node from backup or evict/re-add?

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Thu, Jun 13, 2019 at 3:41 PM Jeff Jirsa <jj...@gmail.com> wrote:

>
> Bootstrapping a new node does not require repairs at all.
>

Was my understanding as well.

Replacing a node only requires repairs to guarantee consistency to avoid
> violating quorum because streaming for bootstrap only streams from one
> replica
>
> Think this way:
>
> Host 1, 2, 3 in a replica set
> You write value A to some key
> It lands on hosts 1 and 3. Host 2 was being restarted or something
> Host 2 comes back up
> Host 3 fails
>
> If you replace 3 with 3’ -
> 3’ May stream from host 1 and now you’ve got a quorum if replicas with A
> 3’ may stream fr host 2, and now you’ve got a quorum if replicas without
> A. This is illegal.
>
> This is just a statistics game - do you have hosts missing writes? If so,
> are hints delivering them when those hosts come back? What’s the cost of
> violating consistency in that second scenario to you?
>
> If you’re running something where correctness really really really
> matters, you must repair first. If you’re actually running a truly eventual
> consistency use case and reading stale writes is fine, you probably won’t
> ever notice.
>

Alright, this makes it much more clear, thank you.

In any case these docs are weird and wrong - joining nodes get writes in
> all versions of Cassandra for the past few years (at least 2.0+), so the
> docs really need to be fixed.
>

:(

--
Alex

Re: Recover lost node from backup or evict/re-add?

Posted by Jeff Jirsa <jj...@gmail.com>.


> On Jun 13, 2019, at 6:29 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
>> On Thu, Jun 13, 2019 at 3:16 PM Jeff Jirsa <jj...@gmail.com> wrote:
> 
>> On Jun 13, 2019, at 2:52 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
>> On Wed, Jun 12, 2019 at 4:02 PM Jeff Jirsa <jj...@gmail.com> wrote:
>>>> To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down
>>> 
>>> How do you suggest to trigger it?  Potentially replicas of the primary range for the down node are all over the local DC, so I would go with triggering a full cluster repair with Cassandra Reaper.  But isn't it going to fail because of the down node?  
>> Im not sure there’s an easy and obvious path here - this is something TLP may want to enhance reaper to help with. 
>> 
>> You have to specify the ranges with -st/-et, and you have to tell it to ignore the down host with -hosts. With vnodes you’re right that this may be lots and lots of ranges all over the ring.
>> 
>> There’s a patch proposed (maybe committed in 4.0) that makes this a nonissue by allowing bootstrap to stream one repaired set and all of the unrepaired replica data (which is probably very small if you’re running IR regularly), which accomplished the same thing.
> 
> Ouch, it really hurts to learn this. :(
>>> It is also documented (I believe) that one should repair the node after it finishes the "replace address" procedure.  So should one repair before and after?
>> You do not need to repair after the bootstrap if you repair before. If the docs say that, they’re wrong. The joining host gets writes during bootstrap and consistency levels are altered during bootstrap to account for the joining host.
> 
> This is what I had in mind (what makes replacement different from actual bootstrap of a new node):

Bootstrapping a new node does not require repairs at all.

Replacing a node only requires repairs to guarantee consistency to avoid violating quorum because streaming for bootstrap only streams from one replica

Think this way:

Host 1, 2, 3 in a replica set
You write value A to some key
It lands on hosts 1 and 3. Host 2 was being restarted or something
Host 2 comes back up
Host 3 fails

If you replace 3 with 3’ - 
3’ May stream from host 1 and now you’ve got a quorum if replicas with A
3’ may stream fr host 2, and now you’ve got a quorum if replicas without A. This is illegal.

This is just a statistics game - do you have hosts missing writes? If so, are hints delivering them when those hosts come back? What’s the cost of violating consistency in that second scenario to you? 

If you’re running something where correctness really really really matters, you must repair first. If you’re actually running a truly eventual consistency use case and reading stale writes is fine, you probably won’t ever notice.  

In any case these docs are weird and wrong - joining nodes get writes in all versions of Cassandra for the past few years (at least 2.0+), so the docs really need to be fixed.

> http://cassandra.apache.org/doc/latest/operating/topo_changes.html?highlight=replace%20address#replacing-a-dead-node 
> Note
> If any of the following cases apply, you MUST run repair to make the replaced node consistent again, since it missed ongoing writes during/prior to bootstrapping. The replacement timeframe refers to the period from when the node initially dies to when a new node completes the replacement process.
> 
> The node is down for longer than max_hint_window_in_ms before being replaced.
> You are replacing using the same IP address as the dead node and replacement takes longer than max_hint_window_in_ms.
> 
> I would imagine that any production size instance would take way longer to replace than the default max hint window (which is 3 hours, AFAIK).  Didn't remember the same IP restriction, but at least this I would also expect to be the most common setup.
> 
> --
> Alex
>

Re: Recover lost node from backup or evict/re-add?

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Thu, Jun 13, 2019 at 3:16 PM Jeff Jirsa <jj...@gmail.com> wrote:

> On Jun 13, 2019, at 2:52 AM, Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
> On Wed, Jun 12, 2019 at 4:02 PM Jeff Jirsa <jj...@gmail.com> wrote:
>
> To avoid violating consistency guarantees, you have to repair the replicas
>> while the lost node is down
>>
>
> How do you suggest to trigger it?  Potentially replicas of the primary
> range for the down node are all over the local DC, so I would go with
> triggering a full cluster repair with Cassandra Reaper.  But isn't it going
> to fail because of the down node?
>
> Im not sure there’s an easy and obvious path here - this is something TLP
> may want to enhance reaper to help with.
>
> You have to specify the ranges with -st/-et, and you have to tell it to
> ignore the down host with -hosts. With vnodes you’re right that this may be
> lots and lots of ranges all over the ring.
>
> There’s a patch proposed (maybe committed in 4.0) that makes this a
> nonissue by allowing bootstrap to stream one repaired set and all of the
> unrepaired replica data (which is probably very small if you’re running IR
> regularly), which accomplished the same thing.
>

Ouch, it really hurts to learn this. :(

> It is also documented (I believe) that one should repair the node after it
> finishes the "replace address" procedure.  So should one repair before and
> after?
>
> You do not need to repair after the bootstrap if you repair before. If the
> docs say that, they’re wrong. The joining host gets writes during bootstrap
> and consistency levels are altered during bootstrap to account for the
> joining host.
>

This is what I had in mind (what makes replacement different from actual
bootstrap of a new node):
http://cassandra.apache.org/doc/latest/operating/topo_changes.html?highlight=replace%20address#replacing-a-dead-node


Note

If any of the following cases apply, you MUST run repair to make the replaced
node consistent again, since it missed ongoing writes during/prior to
bootstrapping. The *replacement* timeframe refers to the period from when
the node initially dies to when a new node completes the replacement
process.


   1. The node is down for longer than max_hint_window_in_ms before being
      replaced.
      2. You are replacing using the same IP address as the dead node and
      replacement takes longer than max_hint_window_in_ms.


I would imagine that any production size instance would take way longer to
replace than the default max hint window (which is 3 hours, AFAIK).  Didn't
remember the same IP restriction, but at least this I would also expect to
be the most common setup.

--
Alex

Re: Recover lost node from backup or evict/re-add?

Posted by Jeff Jirsa <jj...@gmail.com>.

> On Jun 13, 2019, at 2:52 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:
> 
>> On Wed, Jun 12, 2019 at 4:02 PM Jeff Jirsa <jj...@gmail.com> wrote:
> 
>> To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down
> 
> How do you suggest to trigger it?  Potentially replicas of the primary range for the down node are all over the local DC, so I would go with triggering a full cluster repair with Cassandra Reaper.  But isn't it going to fail because of the down node?  

Im not sure there’s an easy and obvious path here - this is something TLP may want to enhance reaper to help with. 

You have to specify the ranges with -st/-et, and you have to tell it to ignore the down host with -hosts. With vnodes you’re right that this may be lots and lots of ranges all over the ring.

There’s a patch proposed (maybe committed in 4.0) that makes this a nonissue by allowing bootstrap to stream one repaired set and all of the unrepaired replica data (which is probably very small if you’re running IR regularly), which accomplished the same thing.

> 
> It is also documented (I believe) that one should repair the node after it finishes the "replace address" procedure.  So should one repair before and after?

You do not need to repair after the bootstrap if you repair before. If the docs say that, they’re wrong. The joining host gets writes during bootstrap and consistency levels are altered during bootstrap to account for the joining host.

Re: Recover lost node from backup or evict/re-add?

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Wed, Jun 12, 2019 at 4:02 PM Jeff Jirsa <jj...@gmail.com> wrote:

> To avoid violating consistency guarantees, you have to repair the replicas
> while the lost node is down
>

How do you suggest to trigger it?  Potentially replicas of the primary
range for the down node are all over the local DC, so I would go with
triggering a full cluster repair with Cassandra Reaper.  But isn't it going
to fail because of the down node?

It is also documented (I believe) that one should repair the node after it
finishes the "replace address" procedure.  So should one repair before and
after?

--
Alex

Re: Recover lost node from backup or evict/re-add?

Posted by Jeff Jirsa <jj...@gmail.com>.

To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down

Once you do that it’s typically easiest to bootstrap a replacement (there’s a property named “replace address first boot” you can google or someone can link) that tells a new joining host to take over for a failed machine.


> On Jun 12, 2019, at 6:54 AM, Alan Gano <AG...@tsys.com> wrote:
> 
>  
> If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs?
>  
> Or is the best way to do an evict/re-add?
>  
>  
> Thanks,
>  
> Alan.
>  
> NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or privileged material. Unless you are the intended addressee, any review, reliance, dissemination, distribution, copying or use whatsoever of this communication is strictly prohibited. If you received this in error, please reply immediately and delete the material from all computers. Email sent through the Internet is not secure. Do not use email to send us confidential information such as credit card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, or other important and confidential information.