You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/04/12 19:20:05 UTC

[jira] [Commented] (CASSANDRA-1988) Prefer to throw Unavailable rather than Timeout

    [ https://issues.apache.org/jira/browse/CASSANDRA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018917#comment-13018917 ] 

Jonathan Ellis commented on CASSANDRA-1988:
-------------------------------------------

If we deserialized the messages in OutboundTcpConnection.closeSocket and poked some kind of "unable to send message" status into the callback, then the callback could use that to throw UnavailableException.

(Deserializing is probably the easiest way to avoid making a bunch of fairly hairy changes to the MS/OTC flow. Performance is a non-issue since we only do it when a node goes down.)

We'd also need to introduce a different exception, since UE signals "I knew I couldn't satisfy the request so I didn't start it" which is useful to distinguish from "some of the replicas may have the write performed but not enough."

Finally, you might still timeout before the FailureDetector signals that the node died, so you still have to deal with the original behavior.

Feels like a lot of complexity for a minor corner case.

> Prefer to throw Unavailable rather than Timeout
> -----------------------------------------------
>
>                 Key: CASSANDRA-1988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1988
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Stu Hood
>             Fix For: 1.0
>
>
> When a node is unreachable, but is not yet being reported dead by gossip, messages are enqueued in the messaging service to be sent when the node becomes available again (on the assumption that the connection dropped temporarily).
> Higher up in the client layer, before sending messages to other nodes, we check that they are alive according to gossip, and fail fast with UnavailableException if they are not (CASSANDRA-1803). If we send messages to nodes that are not yet being reported dead, the messages sit in queue, and time out rather than being sent: this results in the client request failing with a TimeoutException.
> If we differentiate between messages that were never sent (aka, are still queued in the MessagingService at the end of the timeout), and messages that were sent but didn't get a response, we can properly throw UnavailableException in the former case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira