You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anuj Wadehra <an...@yahoo.co.in> on 2016/01/17 05:33:26 UTC

Run Repairs when a Node is Down

Hi 
We are on 2.0.14,RF=3 in a 3 node cluster. We use repair -pr . Recently, we observed that repair -pr for all nodes fails if a node is down. Then I found the JIRA https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
where an intentional decision was taken to abort the repair if a replica is down.
I need to understand the reasoning behind aborting the repair instead of proceeding with available replicas.
I have following concerns with the approach:
We say that we have a fault tolerant Cassandra system such that we can afford single node failure because RF=3 and we read/write at QUORUM.But when a node goes down and we are not sure how much time will be needed to restore the node, entire system health is in question as gc_grace_period is approaching and we are not able to run repair -pr on any of the nodes.
Then there is a dilemma:
Whether to remove the faulty node well before gc grace period so that we get enough time to save data by repairing other two nodes?
This may cause massive streaming which may be unnecessary if we are able to bring back the faulty node up before gc grace period.
OR
Wait and hope that the issue will be resolved before gc grace time and we will have some buffer to run repair -pr on all nodes.
OR
Increase the gc grace period temporarily. Then we should have capacity planning to accomodate the extra storage needed for extra gc grace that may be needed in case of node failure scenarios.

Besides knowing the reasoning behind the decision taken in CASSANDRA-2290, I need to understand the recommeded approach for maintaing a fault tolerant system which can handle node failures such that repair can be run smoothly and system health is maintained at all times.

ThanksAnuj Sent from Yahoo Mail on Android

Re: Run Repairs when a Node is Down

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Thanks Paulo for sharing the JIRA !! I have added my comments there.
"It is not advisable to remain with a down node for a long time without replacing it (with risk of not being able to achieve consistency if another node goes down)."
I am referring to a generic scenario where a cluster may afford 2+ node failures based on RF but due to a single node failure, entire system health is in question as gc period is approaching and nodes are not getting repaired.
 I think the issue is important. I would suggest you and others interested in the issue to join the discussion on JIRA page :https://issues.apache.org/jira/browse/CASSANDRA-10446


ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Tue, 19 Jan, 2016 at 1:21 am, Paulo Motta<pa...@gmail.com> wrote:   Hello Anuj,

Repairing a range with down replicas may be valid if there is still QUORUM up replicas and using at least QUORUM for writes. My understanding is that it was disabled as default behavior on CASSANDRA-2290 to avoid misuse/confusion, and its not advisable to remain with a down node for a long time without replacing it (with risk of not being able to achieve consistency if another node goes down).

Issue https://issues.apache.org/jira/browse/CASSANDRA-10446 was created to allow repairing ranges with down replicas with a special flag (--force). If you're interested please add comments there and/or propose a patch.

Thanks,

Paulo



2016-01-17 1:33 GMT-03:00 Anuj Wadehra <an...@yahoo.co.in>:

Hi 
We are on 2.0.14,RF=3 in a 3 node cluster. We use repair -pr . Recently, we observed that repair -pr for all nodes fails if a node is down. Then I found the JIRA https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
where an intentional decision was taken to abort the repair if a replica is down.
I need to understand the reasoning behind aborting the repair instead of proceeding with available replicas.
I have following concerns with the approach:
We say that we have a fault tolerant Cassandra system such that we can afford single node failure because RF=3 and we read/write at QUORUM.But when a node goes down and we are not sure how much time will be needed to restore the node, entire system health is in question as gc_grace_period is approaching and we are not able to run repair -pr on any of the nodes.
Then there is a dilemma:
Whether to remove the faulty node well before gc grace period so that we get enough time to save data by repairing other two nodes?
This may cause massive streaming which may be unnecessary if we are able to bring back the faulty node up before gc grace period.
OR
Wait and hope that the issue will be resolved before gc grace time and we will have some buffer to run repair -pr on all nodes.
OR
Increase the gc grace period temporarily. Then we should have capacity planning to accomodate the extra storage needed for extra gc grace that may be needed in case of node failure scenarios.

Besides knowing the reasoning behind the decision taken in CASSANDRA-2290, I need to understand the recommeded approach for maintaing a fault tolerant system which can handle node failures such that repair can be run smoothly and system health is maintained at all times.

ThanksAnuj Sent from Yahoo Mail on Android

  

Re: Run Repairs when a Node is Down

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Thanks Paulo for sharing the JIRA !! I have added my comments there.
"It is not advisable to remain with a down node for a long time without replacing it (with risk of not being able to achieve consistency if another node goes down)."
I am referring to a generic scenario where a cluster may afford 2+ node failures based on RF but due to a single node failure, entire system health is in question as gc period is approaching and nodes are not getting repaired.
You and others who are interested can join the discussion on the JIRA page :https://issues.apache.org/jira/browse/CASSANDRA-10446
 ThanksAnuj
 
  On Tue, 19 Jan, 2016 at 1:21 am, Paulo Motta<pa...@gmail.com> wrote:   Hello Anuj,

Repairing a range with down replicas may be valid if there is still QUORUM up replicas and using at least QUORUM for writes. My understanding is that it was disabled as default behavior on CASSANDRA-2290 to avoid misuse/confusion, and its not advisable to remain with a down node for a long time without replacing it (with risk of not being able to achieve consistency if another node goes down).

Issue https://issues.apache.org/jira/browse/CASSANDRA-10446 was created to allow repairing ranges with down replicas with a special flag (--force). If you're interested please add comments there and/or propose a patch.

Thanks,

Paulo



2016-01-17 1:33 GMT-03:00 Anuj Wadehra <an...@yahoo.co.in>:

Hi 
We are on 2.0.14,RF=3 in a 3 node cluster. We use repair -pr . Recently, we observed that repair -pr for all nodes fails if a node is down. Then I found the JIRA https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
where an intentional decision was taken to abort the repair if a replica is down.
I need to understand the reasoning behind aborting the repair instead of proceeding with available replicas.
I have following concerns with the approach:
We say that we have a fault tolerant Cassandra system such that we can afford single node failure because RF=3 and we read/write at QUORUM.But when a node goes down and we are not sure how much time will be needed to restore the node, entire system health is in question as gc_grace_period is approaching and we are not able to run repair -pr on any of the nodes.
Then there is a dilemma:
Whether to remove the faulty node well before gc grace period so that we get enough time to save data by repairing other two nodes?
This may cause massive streaming which may be unnecessary if we are able to bring back the faulty node up before gc grace period.
OR
Wait and hope that the issue will be resolved before gc grace time and we will have some buffer to run repair -pr on all nodes.
OR
Increase the gc grace period temporarily. Then we should have capacity planning to accomodate the extra storage needed for extra gc grace that may be needed in case of node failure scenarios.

Besides knowing the reasoning behind the decision taken in CASSANDRA-2290, I need to understand the recommeded approach for maintaing a fault tolerant system which can handle node failures such that repair can be run smoothly and system health is maintained at all times.

ThanksAnuj Sent from Yahoo Mail on Android

  

Re: Run Repairs when a Node is Down

Posted by Paulo Motta <pa...@gmail.com>.
Hello Anuj,

Repairing a range with down replicas may be valid if there is still QUORUM
up replicas and using at least QUORUM for writes. My understanding is that
it was disabled as default behavior on CASSANDRA-2290 to avoid
misuse/confusion, and its not advisable to remain with a down node for a
long time without replacing it (with risk of not being able to achieve
consistency if another node goes down).

Issue https://issues.apache.org/jira/browse/CASSANDRA-10446 was created to
allow repairing ranges with down replicas with a special flag (--force). If
you're interested please add comments there and/or propose a patch.

Thanks,

Paulo



2016-01-17 1:33 GMT-03:00 Anuj Wadehra <an...@yahoo.co.in>:

> Hi
>
> We are on 2.0.14,RF=3 in a 3 node cluster. We use repair -pr . Recently,
> we observed that repair -pr for all nodes fails if a node is down. Then I
> found the JIRA
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
> where an intentional decision was taken to abort the repair if a replica
> is down.
> I need to understand the reasoning behind aborting the repair instead of
> proceeding with available replicas.
>
> I have following concerns with the approach:
> We say that we have a fault tolerant Cassandra system such that we can
> afford single node failure because RF=3 and we read/write at QUORUM.But
> when a node goes down and we are not sure how much time will be needed to
> restore the node, entire system health is in question as gc_grace_period is
> approaching and we are not able to run repair -pr on any of the nodes.
>
> Then there is a dilemma:
>
> Whether to remove the faulty node well before gc grace period so that we
> get enough time to save data by repairing other two nodes?
>
> This may cause massive streaming which may be unnecessary if we are able
> to bring back the faulty node up before gc grace period.
>
> OR
>
> Wait and hope that the issue will be resolved before gc grace time and we
> will have some buffer to run repair -pr on all nodes.
>
> OR
>
> Increase the gc grace period temporarily. Then we should have capacity
> planning to accomodate the extra storage needed for extra gc grace that may
> be needed in case of node failure scenarios.
>
> Besides knowing the reasoning behind the decision taken in CASSANDRA-2290,
> I need to understand the recommeded approach for maintaing a fault tolerant
> system which can handle node failures such that repair can be run smoothly
> and system health is maintained at all times.
>
> Thanks
> Anuj
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>