You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict Elliott Smith (Jira)" <ji...@apache.org> on 2020/02/22 11:12:00 UTC

[jira] [Comment Edited] (CASSANDRA-15352) Replica failure propagation to coordinator and client

    [ https://issues.apache.org/jira/browse/CASSANDRA-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042515#comment-17042515 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15352 at 2/22/20 11:11 AM:
------------------------------------------------------------------------------

When the query is already known to have failed, but the client simply waits until timeout to receive notification (of a timeout, not failure); and in particular, those where a replica has failed to serve its part of the work but does not report it back to the coordinator, I think?

Though admittedly I'm unclear of its interaction with cheap quorums (which cannot reasonably be informed by replica information, since the cheap quorum is only needed if the replica doesn't respond, though there may be a subset of cases where the replica is unable to perform the work but is able to respond)


was (Author: benedict):
When the query is already known to have failed, but the client simply waits until timeout to receive notification (of a timeout, not failure)

> Replica failure propagation to coordinator and client
> -----------------------------------------------------
>
>                 Key: CASSANDRA-15352
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15352
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Messaging/Internode
>            Reporter: Alex Petrov
>            Priority: Normal
>
> We should add early reporting of replica-side errors, since currently we just time-out requests. On normal read-write path this is not that important, but this is a protocol change we will need to improve cheap quorums for transient replication. This might have potential positive impact for regular read-write path, since we’ll be aborting queries early instead of timing them out. Can be useful for failing / going away nodes (which is also one of the changes we’re planning to implement). 
> We do have means for propagating error both in client protocol through <reasonmap> and in internode through FAILURE_RSP, which is true and we do not have to extend the protocol to implement this change, but this is still a change in protocol behavior, since we’ll be sending a message where we would usually silently timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org