You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Geoffrey Yu (JIRA)" <ji...@apache.org> on 2016/08/19 23:58:20 UTC

[jira] [Updated] (CASSANDRA-2848) Make the Client API support passing down timeouts

     [ https://issues.apache.org/jira/browse/CASSANDRA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Geoffrey Yu updated CASSANDRA-2848:
-----------------------------------
    Attachment: 2848-trunk-v2.txt

I'm attaching a second version of the patch that incorporates the changes in CASSANDRA-12256.

*TL;DR:* The timeout is represented as an {{OptionalLong}} that is encoded in {{QueryOptions}}. It is passed all the way to the replica nodes on reads through {{ReadCommand}}, but is only kept on the coordinator for writes.


The optional client specified timeout is decoded as a part of {{QueryOptions}}. Since this timeout may or may not be specified by a client, I opted to use an {{OptionalLong}} in an effort to make it clearer in the code that this is optional. I’ve gated the use of the new timeout flag (and encoding the timeout) to protocol v5 and above.

On the read path, the timeout is kept within the {{ReadCommand}} and referenced in the {{ReadCallback.awaitResults()}}. It is also serialized within the {{ReadCommand}} so that replica nodes can use it when setting the monitoring time in {{ReadCommandVerbHandler}}. Of course, because the time when the query started is not propagated to the replicas, this will only enforce the timeout from when the {{MessageIn}} was constructed.

On the write path, the timeout is just passed through the call stack into the {{AbstractWriteResponseHandler}}/{{AbstractPaxosCallback}} where it is referenced in the respective {{await()}} calls.

I had investigated the possibility of passing the timeout to the replicas on the write path. To do so we'd need to incorporate it into the outgoing internode message when making a write, meaning placing it into {{Mutation}} or otherwise creating some sort of wrapper around a mutation that can hold the timeout. It seemed like this would be a very invasive change for minimal gain, considering being able to abort an in progress write didn't seem as useful compared to aborting an in progress read.

This still requires a version bump in the internode protocol to support the change in serialization of {{ReadCommand}} (I haven't touched {{MessagingService.current_version}} yet, though). If we don't want to wait till 4.0, we can delay this part of the patch and just retain the custom timeout on the coordinator (i.e. don't serialize the timeout). Once the branch for 4.0 is available, we can modify the serialization to allow us to pass the timeout to the replicas.

I'd also like to include some dtests for this, namely to just validate which timeout is being used on the coordinator. Is the accepted practice for doing something like this to log something and assert for the presence of the log entry? I want to avoid relying on the actual timeout observed since that can cause the test to be flaky.

> Make the Client API support passing down timeouts
> -------------------------------------------------
>
>                 Key: CASSANDRA-2848
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2848
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Geoffrey Yu
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 2848-trunk-v2.txt, 2848-trunk.txt
>
>
> Having a max server RPC timeout is good for worst case, but many applications that have middleware in front of Cassandra, might have higher timeout requirements. In a fail fast environment, if my application starting at say the front-end, only has 20ms to process a request, and it must connect to X services down the stack, by the time it hits Cassandra, we might only have 10ms. I propose we provide the ability to specify the timeout on each call we do optionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)