You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Schuller (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/27 20:22:46 UTC

[jira] [Issue Comment Edited] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

    [ https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217415#comment-13217415 ] 

Peter Schuller edited comment on CASSANDRA-3294 at 2/27/12 7:21 PM:
--------------------------------------------------------------------

{quote}
This sounds like reinventing the existing failure detector to me.
{quote}

Except we don't use it that way at all (see CASSANDRA-3927). Even if we did though, I personally think it's totally the wrong solution to this problem since we have the *perfect* measurement - whether the TCP connection is up.

It's fine if we have other information that actively indicates we shouldn't send messages to it (whether it's the FD or the fact that we have 500 000 messages queued to the node), but if we *know* the TCP connection is down, we should just not send messages to it, period. With the only caveat being that of course we'd have to make sure TCP connections are in fact pro-actively kept up under all circumstances (I'd have to look at code to figure out what issues there are, if any, in detail).

{quote}
The main idea of the algorithm I have mentioned is to make sure that we always do operations (write/read etc.) on the nodes that have the highest probability to be alive determined by live traffic going there instead of passively relying on the failure detector.
{quote}

I have an unfiled ticket to suggest making the proximity sorting probabilistic to avoid the binary "either we get traffic or we dont" (or "either we get data or we get digest") situation. That would certainly help. As would least-requests-outstanding.

You can totally make it so that this ticket is irrelevant by just making the general case well-supported enough that there is no reason to special case this. This was originally filed since we had none of that, and we still don't, and it seemed like a very trivial case to handle for the TCP connection to be actively reset by the other side.

{quote}
After reading CASSANDRA-3722 it seems we can implement required logic at the snitch level taking latency measurements into account. I think we can close this one in favor of CASSANDRA-3722 and continue work/discussion there. What do you think, Brandon, Peter?
{quote}

I think CASSANDRA-3722's original premise doesn't address the concerns I see in real life (I don't want special cases trying to communicate "X is happening"), but towards the end I start agreeing with the ticket more.

In any case, feel free to close if you want. If I ever get to actually implementing this (if at that point there is no other mechanism to remove the need) I'll just re-file or re-open with a patch. We don't need to track this if others aren't interested.
                
      was (Author: scode):
    {quote}
This sounds like reinventing the existing failure detector to me.
{quote}

Except we don't use it that way at all (see CASSANDRA-3927). Even if we did though, I personally think it's totally the wrong solution to this problem since we have the *perfect* measurement - whether the TCP connection is up.

It's fine if we have other information that actively indicates we shouldn't send messages to it (whether it's the FD or the fact that we have 500 000 messages queued to the node), but if we *know* the TCP connection is down, we should just not send messages to it, period. With the only caveat being that of course we'd have to make sure TCP connections are in fact pro-actively kept up under all circumstances (I'd have to look at code to figure out what issues there are, if any, in detail).

{quote}
The main idea of the algorithm I have mentioned is to make sure that we always do operations (write/read etc.) on the nodes that have the highest probability to be alive determined by live traffic going there instead of passively relying on the failure detector.
{quote}

I have an unfiled ticket to suggest making the proximity sorting probabilistic to avoid the binary "either we get traffic or we dont" (or "either we get data or we get digest") situation. That would certainly help. As would least-requests-outstanding.

You can totally make it so that this ticket is irrelevant by just making the general case well-supported enough that there is no reason to special case this. This was originally filed since we had none of that, and we still don't, and it seemed like a very trivial case to handle for the TCP connection to be actively reset by the other side.

{code}
After reading CASSANDRA-3722 it seems we can implement required logic at the snitch level taking latency measurements into account. I think we can close this one in favor of CASSANDRA-3722 and continue work/discussion there. What do you think, Brandon, Peter?
{code}

I think CASSANDRA-3722's original premise doesn't address the concerns I see in real life (I don't want special cases trying to communicate "X is happening"), but towards the end I start agreeing with the ticket more.

In any case, feel free to close if you want. If I ever get to actually implementing this (if at that point there is no other mechanism to remove the need) I'll just re-file or re-open with a patch. We don't need to track this if others aren't interested.
                  
> a node whose TCP connection is not up should be considered down for the purpose of reads and writes
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3294
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Cassandra fails to handle the most simple of cases intelligently - a process gets killed and the TCP connection dies. I cannot see a good reason to wait for a bunch of RPC timeouts and thousands of hung requests to realize that we shouldn't be sending messages to a node when the only possible means of communication is confirmed down. This is why one has to "disablegossip and wait for a while" to restar a node on a busy cluster (especially without CASSANDRA-2540 but that only helps under certain circumstances).
> A more generalized approach where by one e.g. weights in the number of currently outstanding RPC requests to a node, would likely take care of this case as well. But until such a thing exists and works well, it seems prudent to have the very common and controlled form of "failure" be handled better.
> Are there difficulties I'm not seeing?
> I can see that one may want to distinguish between considering something "really down" (and e.g. fail a repair because it's down) from what I'm talking about, so maybe there are different concepts (say one is "currently unreachable" rather than "down") being conflated. But in the specific case of sending reads/writes to a node we *know* we cannot talk to, it seems unnecessarily detrimental.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira