You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark Furlong <mf...@ancestry.com> on 2017/10/06 17:20:47 UTC

Node failure

What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed?

Mark Furlong

Sr. Database Administrator

mfurlong@ancestry.com<ma...@ancestry.com>
M: 801-859-7427
O: 801-705-7115
1300 W Traverse Pkwy
Lehi, UT 84043





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]




Re: Node failure

Posted by Jon Haddad <jo...@jonhaddad.com>.
I’ve had a few use cases for downgrading consistency over the years.  If you’re showing a customer dashboard w/ some Ad summary data, it’s great to be right, but showing a number that’s close is better than not being up.

> On Oct 6, 2017, at 1:32 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> I think it was Brandon that used to make a pretty compelling argument that downgrading consistency on writes was always wrong, because if you can tolerate the lower consistency, you should just use the lower consistency from the start (because cassandra is still going to send the write to all replicas, anyway). 
> 
> On Fri, Oct 6, 2017 at 12:51 PM, Jim Witschey <jim.witschey@datastax.com <ma...@datastax.com>> wrote:
> > Modern client drivers also have ways to “downgrade” the CL of requests, in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html <http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html>
> 
> Quick note from a driver dev's perspective: Mark, yours sounds like a
> bad use case for a downgrading retry policy. If your cluster has an RF
> of 2, and your app requires CL.QUORUM, a downgrading policy will, e.g.
> try at CL.QUORUM and downgrade below your required CL; or try at
> CL.ALL, then fail and downgrade to CL.QUORUM or an equivalent, which
> is what your app needs in the first place.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
> 
> 


Re: Node failure

Posted by Jeff Jirsa <jj...@gmail.com>.
I think it was Brandon that used to make a pretty compelling argument that
downgrading consistency on writes was always wrong, because if you can
tolerate the lower consistency, you should just use the lower consistency
from the start (because cassandra is still going to send the write to all
replicas, anyway).

On Fri, Oct 6, 2017 at 12:51 PM, Jim Witschey <ji...@datastax.com>
wrote:

> > Modern client drivers also have ways to “downgrade” the CL of requests,
> in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/
> latest-java-driver-api/com/datastax/driver/core/policies/
> DowngradingConsistencyRetryPolicy.html
>
> Quick note from a driver dev's perspective: Mark, yours sounds like a
> bad use case for a downgrading retry policy. If your cluster has an RF
> of 2, and your app requires CL.QUORUM, a downgrading policy will, e.g.
> try at CL.QUORUM and downgrade below your required CL; or try at
> CL.ALL, then fail and downgrade to CL.QUORUM or an equivalent, which
> is what your app needs in the first place.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Node failure

Posted by Jim Witschey <ji...@datastax.com>.
> Modern client drivers also have ways to “downgrade” the CL of requests, in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html

Quick note from a driver dev's perspective: Mark, yours sounds like a
bad use case for a downgrading retry policy. If your cluster has an RF
of 2, and your app requires CL.QUORUM, a downgrading policy will, e.g.
try at CL.QUORUM and downgrade below your required CL; or try at
CL.ALL, then fail and downgrade to CL.QUORUM or an equivalent, which
is what your app needs in the first place.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


RE: Node failure

Posted by Mark Furlong <mf...@ancestry.com>.
I’ll check to see what our app is using.

Thanks
Mark
801-705-7115 office

From: Steinmaurer, Thomas [mailto:thomas.steinmaurer@dynatrace.com]
Sent: Friday, October 6, 2017 12:25 PM
To: user@cassandra.apache.org
Subject: RE: Node failure

QUORUM should succeed with a RF=3 and 2 of 3 nodes available.

Modern client drivers also have ways to “downgrade” the CL of requests, in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html


Thomas

From: Mark Furlong [mailto:mfurlong@ancestry.com]
Sent: Freitag, 06. Oktober 2017 19:43
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: RE: Node failure

Thanks for the detail. I’ll have to remove and then add one back in. It’s my consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra <us...@cassandra.apache.org>>
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to replace it. To replace, you'll start a new server with -Dcassandra.replace_address=a.b.c.d ( http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node ) , and it'll stream data from the neighbors and eventually replace the dead node in the ring (the dead node will be removed from 'nodetool status', the new node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do some combination of repair, 'nodetool removenode' or 'nodetool assassinate', and ALTERing the keyspace to set RF=2. The order matters, and so does the consistency level you use for reads/writes (so we can tell you whether or not you're likely to lose data in this process), so I'm not giving step-by-steps here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong <mf...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed?

Mark Furlong

Sr. Database Administrator

mfurlong@ancestry.com<ma...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
Lehi, UT 84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]




The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

RE: Node failure

Posted by "Steinmaurer, Thomas" <th...@dynatrace.com>.
QUORUM should succeed with a RF=3 and 2 of 3 nodes available.

Modern client drivers also have ways to “downgrade” the CL of requests, in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html


Thomas

From: Mark Furlong [mailto:mfurlong@ancestry.com]
Sent: Freitag, 06. Oktober 2017 19:43
To: user@cassandra.apache.org
Subject: RE: Node failure

Thanks for the detail. I’ll have to remove and then add one back in. It’s my consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra <us...@cassandra.apache.org>>
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to replace it. To replace, you'll start a new server with -Dcassandra.replace_address=a.b.c.d ( http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node ) , and it'll stream data from the neighbors and eventually replace the dead node in the ring (the dead node will be removed from 'nodetool status', the new node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do some combination of repair, 'nodetool removenode' or 'nodetool assassinate', and ALTERing the keyspace to set RF=2. The order matters, and so does the consistency level you use for reads/writes (so we can tell you whether or not you're likely to lose data in this process), so I'm not giving step-by-steps here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong <mf...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed?

Mark Furlong

Sr. Database Administrator

mfurlong@ancestry.com<ma...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
Lehi, UT 84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]




The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

RE: Node failure

Posted by Mark Furlong <mf...@ancestry.com>.
Thanks for the detail. I’ll have to remove and then add one back in. It’s my consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to replace it. To replace, you'll start a new server with -Dcassandra.replace_address=a.b.c.d ( http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node ) , and it'll stream data from the neighbors and eventually replace the dead node in the ring (the dead node will be removed from 'nodetool status', the new node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do some combination of repair, 'nodetool removenode' or 'nodetool assassinate', and ALTERing the keyspace to set RF=2. The order matters, and so does the consistency level you use for reads/writes (so we can tell you whether or not you're likely to lose data in this process), so I'm not giving step-by-steps here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong <mf...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed?

Mark Furlong

Sr. Database Administrator

mfurlong@ancestry.com<ma...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
Lehi, UT 84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]





Re: Node failure

Posted by Jeff Jirsa <jj...@gmail.com>.
There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically
remove it if it'll never be replaced, but in RF=3 with 3 nodes, you
probably need to replace it. To replace, you'll start a new server with
-Dcassandra.replace_address=a.b.c.d (
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
) , and it'll stream data from the neighbors and eventually replace the
dead node in the ring (the dead node will be removed from 'nodetool
status', the new node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll
do some combination of repair, 'nodetool removenode' or 'nodetool
assassinate', and ALTERing the keyspace to set RF=2. The order matters, and
so does the consistency level you use for reads/writes (so we can tell you
whether or not you're likely to lose data in this process), so I'm not
giving step-by-steps here because it's not very straight forward and there
are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong <mf...@ancestry.com> wrote:

> What happens when I have a 3 node cluster with RF 3 and a node fails that
> needs to be removed?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurlong@ancestry.com <mf...@ancestry.com>*
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
>
> Lehi, UT 84043
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>

RE: Node failure

Posted by Mark Furlong <mf...@ancestry.com>.
We are using quorum on our reads and writes.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, October 6, 2017 11:30 AM
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Node failure

If you write with CL:ANY, CL:ONE (or LOCAL_ONE), and one node fails, you may lose data that hasn't made it to other nodes.


On Fri, Oct 6, 2017 at 10:28 AM, Mark Furlong <mf...@ancestry.com>> wrote:
The only time I’ll have a problem is if I have a do a read all or write all. Any other gotchas I should be aware of?

Thanks
Mark
801-705-7115<tel:(801)%20705-7115> office

From: Akshit Jain [mailto:akshit13124@iiitd.ac.in<ma...@iiitd.ac.in>]
Sent: Friday, October 6, 2017 11:25 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Node failure

You replace it with a new node and bootstraping happens.The new node receives data from other two nodes.
Rest depends on the scenerio u are asking for.

Regards
Akshit Jain
B-Tech,2013124
9891724697
[Image removed by sender.]

On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong <mf...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed?

Mark Furlong

Sr. Database Administrator

mfurlong@ancestry.com<ma...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
Lehi, UT 84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]






Re: Node failure

Posted by Jeff Jirsa <jj...@gmail.com>.
If you write with CL:ANY, CL:ONE (or LOCAL_ONE), and one node fails, you
may lose data that hasn't made it to other nodes.


On Fri, Oct 6, 2017 at 10:28 AM, Mark Furlong <mf...@ancestry.com> wrote:

> The only time I’ll have a problem is if I have a do a read all or write
> all. Any other gotchas I should be aware of?
>
>
>
> *Thanks*
>
> *Mark*
>
> *801-705-7115 <(801)%20705-7115> office*
>
>
>
> *From:* Akshit Jain [mailto:akshit13124@iiitd.ac.in]
> *Sent:* Friday, October 6, 2017 11:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Node failure
>
>
>
> You replace it with a new node and bootstraping happens.The new node
> receives data from other two nodes.
>
> Rest depends on the scenerio u are asking for.
>
>
> Regards
>
> Akshit Jain
>
> B-Tech,2013124
>
> 9891724697
>
> [image: Image removed by sender.]
>
>
>
> On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong <mf...@ancestry.com>
> wrote:
>
> What happens when I have a 3 node cluster with RF 3 and a node fails that
> needs to be removed?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurlong@ancestry.com <mf...@ancestry.com>*
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
>
> Lehi, UT 84043
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>
>
>

RE: Node failure

Posted by Mark Furlong <mf...@ancestry.com>.
The only time I’ll have a problem is if I have a do a read all or write all. Any other gotchas I should be aware of?

Thanks
Mark
801-705-7115 office

From: Akshit Jain [mailto:akshit13124@iiitd.ac.in]
Sent: Friday, October 6, 2017 11:25 AM
To: user@cassandra.apache.org
Subject: Re: Node failure

You replace it with a new node and bootstraping happens.The new node receives data from other two nodes.
Rest depends on the scenerio u are asking for.

Regards
Akshit Jain
B-Tech,2013124
9891724697
[Image removed by sender.]

On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong <mf...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed?

Mark Furlong

Sr. Database Administrator

mfurlong@ancestry.com<ma...@ancestry.com>
M: 801-859-7427
O: 801-705-7115
1300 W Traverse Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
Lehi, UT 84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]





Re: Node failure

Posted by Akshit Jain <ak...@iiitd.ac.in>.
You replace it with a new node and bootstraping happens.The new node
receives data from other two nodes.
Rest depends on the scenerio u are asking for.

Regards
Akshit Jain
B-Tech,2013124
9891724697


On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong <mf...@ancestry.com> wrote:

> What happens when I have a 3 node cluster with RF 3 and a node fails that
> needs to be removed?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurlong@ancestry.com <mf...@ancestry.com>*
> M: 801-859-7427
>
> O: 801-705-7115
>
> 1300 W Traverse Pkwy
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
>
> Lehi, UT 84043
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043&entry=gmail&source=g>
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>