You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Martin Xue <ma...@gmail.com> on 2019/08/02 07:21:46 UTC

What happened about one node in cluster is down?

Hello,

I am currently running into a production issue, and seek help from the
community to help.

Can anyone help with the following question regarding the Cassandra down
node inside cluster?

Case:
Cassandra 3.0.14
3 nodes (A, B, C) in DC1, 3 nodes (D, E, F) in DC2 forming one cluster

keyspace_m: Replication Factor is 2 in DC1, and DC2

application_z read and write consistency is both local quorum


Issue:
node A in DC1 has crashed, and has been down for more than 24 hours,
(outside the default hint3 hours window).

Questions:
1. for old data in node A, will the data be re-sync to node B, or C after
node A was down?
2. for new data, if application_z is trying to write, will the data be
always written to the only two running nodes (B and C) in DC1, or it will
fail if it still tries to write to node A?
3. if application_z is to read, will it fail (for old data before node A
crash and for new data after node A crash)? will the data be replicated
from A to B or C?
3. what is the best strategy under this senario?
4. Shall I bring up the node A and run repair on all the nodes (A, B, C, D,
E, F)
(a potential issue, as repair may cause the similar crash happened on node
A , and there are big 1TB keyspace to repair)
5. Shall I simply just decommision node A, and add new node F into DC1 into
cluster?


Your help would be appreciated.

Thanks
Regards
Martin

Re: What happened about one node in cluster is down?

Posted by Martin Xue <ma...@gmail.com>.

Thank you Jeff, appreciated.

Regards
Martin

On Fri, Aug 2, 2019 at 10:52 PM Jeff Jirsa <jj...@gmail.com> wrote:

>
>
>
> On Aug 2, 2019, at 5:47 AM, Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
> > 3. what is the best strategy under this senario?
>>
>> Go to RF=3 or read and write at quorum so you’re doing 3/4 instead of 2/2
>
>
> Jeff, did you mean "2/3 vs. 2/2"?
>
>
> No but my wording was poor
>
> Moving to RF=3 would be local quorum 2/3, able to tolerate a down host but
> increases storage needs
>
> Shifting operations to quorum would use both dcs and make it 3/4, tolerate
> one down host, no increase in storage cost, but adds wan latency to cross
> dcs
>
>

Re: What happened about one node in cluster is down?

Posted by Jeff Jirsa <jj...@gmail.com>.



On Aug 2, 2019, at 5:47 AM, Oleksandr Shulgin <ol...@zalando.de> wrote:

>> > 3. what is the best strategy under this senario? 
>> 
>> Go to RF=3 or read and write at quorum so you’re doing 3/4 instead of 2/2
> 
> Jeff, did you mean "2/3 vs. 2/2"?
> 

No but my wording was poor

Moving to RF=3 would be local quorum 2/3, able to tolerate a down host but increases storage needs 

Shifting operations to quorum would use both dcs and make it 3/4, tolerate one down host, no increase in storage cost, but adds wan latency to cross dcs

Re: What happened about one node in cluster is down?

Posted by Oleksandr Shulgin <ol...@zalando.de>.

>
> > 3. what is the best strategy under this senario?
>
> Go to RF=3 or read and write at quorum so you’re doing 3/4 instead of 2/2


Jeff, did you mean "2/3 vs. 2/2"?

-Alex

Re: What happened about one node in cluster is down?

Posted by Jeff Jirsa <jj...@gmail.com>.


> On Aug 2, 2019, at 12:21 AM, Martin Xue <ma...@gmail.com> wrote:
> 
> Hello,
> 
> I am currently running into a production issue, and seek help from the community to help.
> 
> Can anyone help with the following question regarding the Cassandra down node inside cluster?
> 
> Case:
> Cassandra 3.0.14
> 3 nodes (A, B, C) in DC1, 3 nodes (D, E, F) in DC2 forming one cluster
> 
> keyspace_m: Replication Factor is 2 in DC1, and DC2
> 
> application_z read and write consistency is both local quorum
> 

RF=2 and local quorum basically guarantees an outage in a given DC if any single host dies, so it’s only recommended if you can fail out of a DC safely (which means eventually consistent data model, when you fail out the remote DC is in an undefined state since you’re using local quorum)

> 
> Issue:
> node A in DC1 has crashed, and has been down for more than 24 hours, (outside the default hint3 hours window).
> 
> Questions:
> 1. for old data in node A, will the data be re-sync to node B, or C after node A was down?

Both, but only B or C for any piece of data

With RF=2, data is on either:
AB
BC
AC

So if A crashes, bringing it back or replacing it will sync from its only surviving replica for each piece of data

> 2. for new data, if application_z is trying to write, will the data be always written to the only two running nodes (B and C) in DC1, or it will fail if it still tries to write to node A?

It will fail. Ownership doesn’t change just because one host goes down. For a piece of data owned by A and any other node, you’re going to fail if A is down and you use this replication factor and consistency 

> 3. if application_z is to read, will it fail (for old data before node A crash and for new data after node A crash)? will the data be replicated from A to B or C?
Fail, will throw unavailable exception 


> 3. what is the best strategy under this senario? 

Go to RF=3 or read and write at quorum so you’re doing 3/4 instead of 2/2 (but then you’ll fail of the wan link goes down, and your reads and writes will cross the wan adding latency)

> 4. Shall I bring up the node A and run repair on all the nodes (A, B, C, D, E, F) 
> (a potential issue, as repair may cause the similar crash happened on node A , and there are big 1TB keyspace to repair)

Since you’re past hint window, you’re going to have a lot of data to repair, and your chance of resurrecting data due to exceeding gc grace is nonzero, so it may make sense to replace. Replace will take longer, so bringing it online may be an easier way to end the outage, depending on the business cost of data resurrection (unless you have “only purge repaired tombstones” which will prevent resurrection, though potentially introduces other issues with incremental repair)

> 5. Shall I simply just decommision node A, and add new node F into DC1 into cluster?

May be easier than trying to run repair. In this scenario only, you can replace without running repair and without violating consistency 

> 
> 
> Your help would be appreciated.
> 
> Thanks
> Regards
> Martin
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org