You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Jira)" <ji...@apache.org> on 2020/05/12 15:46:00 UTC

[jira] [Updated] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

     [ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-15805:
-----------------------------------------
     Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable Corruption / Loss(12986)
       Complexity: Normal
    Discovered By: User Report
         Severity: Critical
         Assignee: Sylvain Lebresne
           Status: Open  (was: Triage Needed)

> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Local/SSTable
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Normal
>
> The legacy reading code ({{LegacyLayout}} and {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly the case where a range tombstone covering multiple rows interacts with a collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set<text>,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same table or compacted together), then this will result in the inserted row being duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, the additional code added to CASSANDRA-15789 to force merging duplicated rows if they are produced _will_ end up fixing this as a consequence (assuming there is no variation of this problem that leads to other visible issues than duplicated rows). That said, I "think" we'd still rather fix the source of the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org