You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Jira)" <ji...@apache.org> on 2020/05/12 15:45:00 UTC

[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

    [ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105536#comment-17105536 ] 

Sylvain Lebresne commented on CASSANDRA-15805:
----------------------------------------------

To understand why this happens, let me write down the atoms that the example of the description generates on 2.X (using a simplified representation that I hope is clear enough):
{noformat}
atom1: RT([A:_, A:X:b:_])@1,     // beginning of all 'A' rows to beginning of A:X's b column
atom2: Cell(A:X:)@2,             // row marker for A:X
atom3: Cell(A:X:a=foo)@2,        // value of a in A:X
atom4: RT([A:X:b:_, A:X:b:!])@3, // collection tombstone for b in A:X's
atom5: RT([A:X:b:!, A:!])@1,     // remainder of covering RT, from end of b in A:X to end of all 'A' rows
atom6: Cell(A:X:c=bar)@2         // value of c in A:X
{noformat}
Those atoms are deserialized into {{LegacyCell}} and {{LegacyRangeTombstone}} on 3.X as:
{noformat}
atom1: RT(Bound(INCL_START_BOUND(A), collection=null)-Bound(EXCL_END_BOUND(A:B), collection=null), deletedAt=1, localDeletion=1589204864)
atom2: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=null, collElt=null), v=, ts=2, ldt=2147483647, ttl=0)
atom3: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=a, collElt=null), v=foo, ts=2, ldt=2147483647, ttl=0)
atom4: RT(Bound(INCL_START_BOUND(A:X), collection=b)-Bound(INCL_END_BOUND(A:X), collection=b), deletedAt=3, localDeletion=1589204864)
atom5: RT(Bound(EXCL_START_BOUND(A:X), collection=null)-Bound(INCL_END_BOUND(A), collection=null), deletedAt=1, localDeletion=1589204864)
atom6: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=c, collElt=null), v=bar, ts=2, ldt=2147483647, ttl=0)
{noformat}

I'll point out that those are a direct translation of the 2.X atoms except for {{atom1}} and {{atom5}} that are slightly different:
* instead of {{atom1}} stopping at the beginning of the row {{b}} column, it extends to the end of the row.
* and instead of {{atom5}} staring after that {{b}} column, it starts after the row. Do note however that the order of atoms is still the one above, so that atom is effectively out-of-order.

The reason for those differences is the logic [at the beginning of {{LegacyLayout.RangeTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1883], whose comment is trying to explain, but is basically due to the legacy layer having to map all 2.X RTs into either a 3.X range tombstone (so one over multiple rows), a row tombstone or a collection one.

Anyway, as mentioned above, the problem is that {{atom5}} is out of order.  What currently happens is that when {{atom5}} is encountered by {{UnfilteredDeserialized.OldFormatDeserializer}}, it will be passed to the {{CellGrouper}} currently grouping the row, and will end up in the [{{CellGrouper#addGenericTombstone}} method|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1544].  But, because that atom starts strictly after the row being grouped, the method returns {{false}} and the row is generated a first time. Later, we get {{atom6}} which restarts the row with the value of column {{c}}, after which it is generated a second time.


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Local/SSTable
>            Reporter: Sylvain Lebresne
>            Priority: Normal
>
> The legacy reading code ({{LegacyLayout}} and {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly the case where a range tombstone covering multiple rows interacts with a collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set<text>,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same table or compacted together), then this will result in the inserted row being duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, the additional code added to CASSANDRA-15789 to force merging duplicated rows if they are produced _will_ end up fixing this as a consequence (assuming there is no variation of this problem that leads to other visible issues than duplicated rows). That said, I "think" we'd still rather fix the source of the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org