You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Schuller (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/27 20:38:47 UTC
[jira] [Issue Comment Edited] (CASSANDRA-3722) Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217445#comment-13217445 ] 

Peter Schuller edited comment on CASSANDRA-3722 at 2/27/12 7:38 PM:
--------------------------------------------------------------------

I'm -0 on the original bit of this ticket, but +1 on more generic changes that covers the original use case as good if not better anyway. I think that instead of trying to predict exactly the behavior of some particular event like compaction, we should just be better at actually responding to what is actually going on:

* We have CASSANDRA-2540 which can help avoid blocking uselessly on a dropped or slow request even if we haven't had the opportunity to react to overall behavior yet (I have a partial patch that breaks read repair, I haven't had time to finish it).
* Taking into account the number of outstanding requests is IMO a necessity. There is plenty of precedent for anyone who wants that (least used connections policies in various LB:s), but more importantly it would so clearly help in several situations, including:
** Sudden GC pause of a node
** Sudden death of a node
** Sudden page cache eviction and slowness of a node, before snitching figures it out
** Constantly overloaded node; even with the dynsnitch it would improve the situation as the number of requests affected by a dynsnitch reset is lessened
** Packet loss/hiccup/whatever across DC:s

There is some potential for foot shooting in the sense that if a node is broken in a way that it responds with incorrect data, but responds faster than anyone else, it will tend to "swallow" all the traffic. But honestly, that feels like a minor concern to me based on what I've seen actually happen in production clusters. If we ever start sending non-successes back over inter-node RPC, this would change however.

My only major concern is potential performance impacts of keeping track of the number of outstanding requests, but if that *does* become a problem one can make it probabilistic - have N % of all requests be tracked. Less impact, but also less immediate response to what's happening.

This will also have the side-effect of mitigating sudden bursts of promotion into old-gen if we combine it with pro-actively dropping read-repair messages for nodes that are overloaded (effectively prioritizing data reads), hence helping for CASSANDRA-3853.

{quote}
Should we T (send additional requests which are not part of the normal operations) the requests until the other node recovers?
{quote}

In the absence of read repair, we'd have to do speculative reads as Stu has previously noted. With read repair turned on, this is not an issue because the node will still receive requests and eventually warm up. Only with read repair turned off do we not send requests to more than the first N of endpoints, with N being what is required by CL.

Semi-relatedly, I think it would be a good idea to make the proximity sorting probabilistic in nature so that we don't do a binary flip back and fourth between who gets data vs. digest reads or who doesn't get reads at all. That might mitigate this problem, but not help fundamentally since the rate of warm-up would decrease with a node being slow.

I do want to make this point though: *Every single production cluster* I have ever been involved with so far, has been such that you basically never want to turn read repair off. Not because of read repair itself, but because of the traffic it generates. Having nodes not receive traffic is extremely dangerous under most circumstances as it leaves nodes cold, only to suddenly explode and cause timeouts and other bad behavior as soon as e.g. some neighbor goes down and it suddenly starts taking traffic. This is an easy way to make production clusters fall over. If your workload is entirely in memory or otherwise not reliant on caching the problem is much less pronounced, but even then I would generally recommend that you keep it turned on if only because your nodes will have to be able to take the additional load *anyway* if you are to survive other nodes in the neighborhood going down. It just makes clusters much more easy to reason about.
                
      was (Author: scode):
    I'm -0 on the original bit of this ticket, but +1 on more generic changes that covers the original use case as good if not better anyway. I think that instead of trying to predict exactly the behavior of some particular event like compaction, we should just be better at actually responding to what is actually going on:

* We have CASSANDRA-2540 which can help avoid blocking uselessly on a dropped or slow request even if we haven't had the opportunity to react to overall behavior yet (I have a partial patch that breaks read repair, I haven't had time to finish it).
* Taking into account the number of outstanding requests is IMO a necessity. There is plenty of precedent for anyone who wants that (least used connections policies in various LB:s), but more importantly it would so clearly help in several situations, including:
** Sudden GC pause of a node
** Sudden death of a node
** Sudden page cache eviction and slowness of a node, before snitching figures it out
** Constantly overloaded node; even with the dynsnitch it would improve the situation as the number of requests affected by a dynsnitch reset is lessened
** Packet loss/hiccup/whatever across DC:s

There is some potential for foot shooting in the sense that if a node is broken in a way that it responds with incorrect data, but responds faster than anyone else, it will tend to "swallow" all the traffic. But honestly, that feels like a minor concern to me based on what I've seen actually happen in production clusters. If we ever start sending non-successes back over inter-node RPC, this would change however.

My only major concern is potential performance impacts of keeping track of the number of outstanding requests, but if that *does* become a problem one can make it probabilistic - have N % of all requests be tracked. Less impact, but also less immediate response to what's happening.

This will also have the side-effect of mitigating sudden bursts of promotion into old-gen if we combine it with pro-actively dropping read-repair messages for nodes that are overloaded (effectively prioritizing data reads), hence helping for CASSANDRA-3853.

{code}
Should we T (send additional requests which are not part of the normal operations) the requests until the other node recovers?
{code}

In the absence of read repair, we'd have to do speculative reads as Stu has previously noted. With read repair turned on, this is not an issue because the node will still receive requests and eventually warm up. Only with read repair turned off do we not send requests to more than the first N of endpoints, with N being what is required by CL.

Semi-relatedly, I think it would be a good idea to make the proximity sorting probabilistic in nature so that we don't do a binary flip back and fourth between who gets data vs. digest reads or who doesn't get reads at all. That might mitigate this problem, but not help fundamentally since the rate of warm-up would decrease with a node being slow.

I do want to make this point though: *Every single production cluster* I have ever been involved with so far, has been such that you basically never want to turn read repair off. Not because of read repair itself, but because of the traffic it generates. Having nodes not receive traffic is extremely dangerous under most circumstances as it leaves nodes cold, only to suddenly explode and cause timeouts and other bad behavior as soon as e.g. some neighbor goes down and it suddenly starts taking traffic. This is an easy way to make production clusters fall over. If your workload is entirely in memory or otherwise not reliant on caching the problem is much less pronounced, but even then I would generally recommend that you keep it turned on if only because your nodes will have to be able to take the additional load *anyway* if you are to survive other nodes in the neighborhood going down. It just makes clusters much more easy to reason about.
                  
> Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3722
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3722
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> Currently Dynamic snitch looks at the latency for figuring out which node will be better serving the requests, this works great but there is a part of the traffic sent to collect this data... There is also a window when Snitch doesn't know about some major event which are going to happen on the node (Node which is going to receive the data request).
> It would be great if we can send some sort hints to the Snitch so they can score based on known events causing higher latencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira