You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/11/16 16:47:11 UTC

[jira] [Commented] (CASSANDRA-8589) Reconciliation in presence of tombstone might yield state data

    [ https://issues.apache.org/jira/browse/CASSANDRA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006812#comment-15006812 ] 

Sylvain Lebresne commented on CASSANDRA-8589:
---------------------------------------------

I took a few minutes to adapt our existing short read dtest to the example in the description, the result being [here|https://github.com/pcmanus/cassandra-dtest/commits/8933_test], and this is indeed a problem in pre-3.0. For 3.0, CASSANDRA-8099 does solve it however and the test pass there. So adapting the fix version in consequence.

> Reconciliation in presence of tombstone might yield state data
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-8589
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8589
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>             Fix For: 2.1.x, 2.2.x
>
>
> Consider 3 replica A, B, C (so RF=3) and consider that we do the following sequence of actions at {{QUORUM}} where I indicate the replicas acknowledging each operation (and let's assume that a replica that don't ack is a replica that don't get the update):
> {noformat}
> CREATE TABLE test (k text, t int, v int, PRIMARY KEY (k, t))
> INSERT INTO test(k, t, v) VALUES ('k', 0, 0); // acked by A, B and C
> INSERT INTO test(k, t, v) VALUES ('k', 1, 1); // acked by A, B and C
> INSERT INTO test(k, t, v) VALUES ('k', 2, 2); // acked by A, B and C
> DELETE FROM test WHERE k='k' AND t=1;         // acked by A and C
> UPDATE test SET v = 3 WHERE k='k' AND t=2;    // acked by B and C
> SELECT * FROM test WHERE k='k' LIMIT 2;       // answered by A and B
> {noformat}
> Every operation has achieved quorum, but on the last read, A will respond {{0->0, tombstone 1, 2->2}} and B will respond {{0->0, 1->1}}. As a consequence we'll answer {{0->0, 2->2}} which is incorrect (we should respond {{0->0, 2->3}}).
> Put another way, if we have a limit, every replica honors that limit but since tombstones can "suppress" results from other nodes, we may have some cells for which we actually don't get a quorum of response (even though we globally have a quorum of replica responses).
> In practice, this probably occurs rather rarely and so the "simpler" fix is probably to do something similar to the "short reads protection": detect when this could have happen (based on how replica response are reconciled) and do an additional request in that case. That detection will have potential false positives but I suspect we can be precise enough that those false positives will be very very rare (we should nonetheless track how often this code gets triggered and if we see that it's more often than we think, we could pro-actively bump user limits internally to reduce those occurrences).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)