You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Jira)" <ji...@apache.org> on 2020/05/13 12:52:00 UTC

[jira] [Comment Edited] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

    [ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106271#comment-17106271 ] 

Sylvain Lebresne edited comment on CASSANDRA-15805 at 5/13/20, 12:51 PM:
-------------------------------------------------------------------------

There is probably a few variations for how to fix this, but what feels the more intuitive to me is to:
 # slightly modify the ctor of {{LegacyRangeTombstone}} so the {{atom5}} of my previous comment use an inclusive start. Mostly because that make the rest of the logic a bit simpler imo (we can still assume that when we get a atom whose cluster strictly sort after the currently grouper row, we're done with that row).
 # modify {{UnfilteredDeserializer.OldFormatDeserializer}} so that when, while grouping a row, it encounters a RT that covers it, is "splits" that into a row tombstone covering the row, and push back the handling of the rest of the tombstone to when the row is truly finished.

I've pushed a patch doing so for 3.0 below (thanks to [~marcuse] for triggering CI on this):
||branch||unit tests||dtests||jvm dtests||jvm upgrade dtest||
|[https://github.com/pcmanus/cassandra/commits/C-15805-3.0]|[utests|https://circleci.com/gh/krummas/cassandra/3289]|[vnodes|https://circleci.com/gh/krummas/cassandra/3292] [no-vnodes|https://circleci.com/gh/krummas/cassandra/3293]|[jvm dtests|https://circleci.com/gh/krummas/cassandra/3290]|[upgrade dtests|https://circleci.com/gh/krummas/cassandra/3294]|

I'll note that the branch contains another small fix, that is a lot less important. Namely, [the return at the beginning of {{CellGrouper#addCollectionTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1516] should be {{true}}, not {{false}} as it currently is. The test in question checks if a collection tombstone happens to not be selected by the query we're decoding data for. If it isn't included, we can ignore the tombstone, so we can/should return, but not with {{false}} as that imply the row is finished, which it probably isn't.

Now the reason I say that last problem is less important is that in practice, only thrift queries should run into this (since CQL queries queries all column effectively) so even if we duplicate the row here, it won't matter when the result is converted back to thrift (besides, having a collection tombstone implies that this is a thrift query on a CQL table, which is dodgy in the first place). Anyway, the code is still obviously wrong and the fix trivial, so included it it nonetheless (in a separate commit).


was (Author: slebresne):
There is probably a few variations for how to fix this, but what feels the more intuitive to me is to:
# slightly modify the ctor of {{LegacyRangeTombstone}} so the {{atom5}} of my previous comment use an inclusive start. Mostly because that make the rest of the logic a bit simpler imo (we can still assume that when we get a atom whose cluster strictly sort after the currently grouper row, we're done with that row).
# modify {{UnfilteredDeserializer.OldFormatDeserializer}} so that when, while grouping a row, it encounters a RT that covers it, is "splits" that into a row tombstone covering the row, and push back the handling of the rest of the tombstone to when the row is truly finished.

I've pushed a patch doing so for 3.0 below (thanks to [~markus] for triggering CI on this):
||branch||unit tests||dtests||jvm dtests||jvm upgrade dtest||
| https://github.com/pcmanus/cassandra/commits/C-15805-3.0 | [utests|https://circleci.com/gh/krummas/cassandra/3289] | [vnodes|https://circleci.com/gh/krummas/cassandra/3292] [no-vnodes|https://circleci.com/gh/krummas/cassandra/3293] | [jvm dtests|https://circleci.com/gh/krummas/cassandra/3290] | [upgrade dtests|https://circleci.com/gh/krummas/cassandra/3294] |

I'll note that the branch contains another small fix, that is a lot less important. Namely, [the return at the beginning of {{CellGrouper#addCollectionTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1516] should be {{true}}, not {{false}} as it currently is. The test in question checks if a collection tombstone happens to not be selected by the query we're decoding data for. If it isn't included, we can ignore the tombstone, so we can/should return, but not with {{false}} as that imply the row is finished, which it probably isn't.

Now the reason I say that last problem is less important is that in practice, only thrift queries should run into this (since CQL queries queries all column effectively) so even if we duplicate the row here, it won't matter when the result is converted back to thrift (besides, having a collection tombstone implies that this is a thrift query on a CQL table, which is dodgy in the first place). Anyway, the code is still obviously wrong and the fix trivial, so included it it nonetheless (in a separate commit).


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Local/SSTable
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly the case where a range tombstone covering multiple rows interacts with a collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set<text>,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same table or compacted together), then this will result in the inserted row being duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, the additional code added to CASSANDRA-15789 to force merging duplicated rows if they are produced _will_ end up fixing this as a consequence (assuming there is no variation of this problem that leads to other visible issues than duplicated rows). That said, I "think" we'd still rather fix the source of the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org