You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yifan Cai (Jira)" <ji...@apache.org> on 2019/12/02 19:54:00 UTC

[jira] [Commented] (CASSANDRA-15350) Add CAS “uncertainty” and “contention" messages that are currently propagated as a WriteTimeoutException.

    [ https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986310#comment-16986310 ] 

Yifan Cai commented on CASSANDRA-15350:
---------------------------------------

[~ifesdjeen] and [~spod], big thanks for reviewing the patch!

Renaming the exception to {{CasWriteStalledException}} and the suggested rephrased description for {{CAS_UNCERTAINTY}} sounds good. 
 The meaning of {{CasWriteUncertainty}} is vague, and as being pointed out, the WTE indicates that the result is uncertain too. {{CasWriteStalled}} describes what happened better. 
 I meant to put _Paxos read_ when writing the description for the {{CAS_UNVERTAINTY}} clause. I will update the description with the suggested one considered. 

Regarding the cross-version scenarios, I may be wrong, but my current understanding is that the ErrorMessage is *not* involved in internode messageing. ErrorMessage, derived from {{org.apache.cassandra.transport.Message.Response}}, is client-facing. When a sub-V5 (i.e. V4) client connects to the V5 server and gets the {{CasWriteTimeoutException}}, the server encoding makes sure to produce a backward compatible one, so the sub-V5 client is still good to understand the server response. 
 The {{decode}} method in {{ErrorMessage}} seems to be only useful for {{org.apache.cassandra.transport.SimpleClient/Client}}, which is not started in cassandra server.
{quote}Unless you're submitting patches to 2.2, 3.0, and 3.11, let's roll back changes to IMessageFilters, since their API has to be binary compatible with older versions.
{quote}
The test cases in {{CasWriteTest}} relies on the message intercept function. I will back-port the changes to IMessageFilter to the prior versions.
{quote}Should we add timeout tests for responses as well as requests in CasWriteTest?
{quote}
Sure. Sound good.
{quote}Is it possible to try and simplify testCasWriteTimeoutDueToContention, can we achieve contention with N threads?
{quote}
The test does achieve contention with N threads (1 thread per client). In addition, the scenario was carefully crafted to be deterministic and aims to produce the same kind of contention.
{quote}both tests peer quite a lot into implementation internals.
{quote}
The test cases mainly manipulate the internode networking to introduce latency/partition. In order to produce (and always produce) a rare contention scenario, I think those fine-grained control is necessary.
{quote}In ErrorMessage#decode, there are extra brackets around WRITE_TIMEOUT clause. You can remove those and fix indentation. Same happens in CAS_UNCERTAINTY case.
{quote}
Removing the brackets in the {{switch-case}} statements gives syntax error since we are defining the variables with the same name. The brackets help to scope the variables.
{quote}If we add comments for activate and deactivate for off/on, maybe it's worth to call those off/on?
{quote}
Do you mean rename the method to activate/deactivate, or change the comments to on/off? Both sound good to me.

> Add CAS “uncertainty” and “contention" messages that are currently propagated as a WriteTimeoutException.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15350
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Lightweight Transactions
>            Reporter: Alex Petrov
>            Assignee: Yifan Cai
>            Priority: Normal
>              Labels: protocolv5, pull-request-available
>         Attachments: Utf8StringEncodeBench.java
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, CAS uncertainty introduced in https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as WriteTimeout. One of this conditions it manifests is when there’s at least one acceptor that has accepted the value, which means that this value _may_ still get accepted during the later round, despite the proposer failure. Similar problem happens with CAS contention, which is also indistinguishable from the “regular” timeout, even though it is visible in metrics correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org