You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Blake Eggleston (JIRA)" <ji...@apache.org> on 2018/11/01 17:52:00 UTC

[jira] [Comment Edited] (CASSANDRA-14861) sstable min/max metadata can cause data loss

    [ https://issues.apache.org/jira/browse/CASSANDRA-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671870#comment-16671870 ] 

Blake Eggleston edited comment on CASSANDRA-14861 at 11/1/18 5:51 PM:
----------------------------------------------------------------------

|[3.0|https://github.com/bdeggleston/cassandra/tree/14861-3.0]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14861-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/14861-3.11]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14861-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/14861-trunk]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/14861-trunk]|

This adds a minor sstable version to 3.x and changes 2 behaviors. First, when reading metadata for pre-md sstables, -only the first clustering value is loaded into the min/max values and the rest are discarded- min max values are discarded. When writing new sstables, the size of the min/max values written are limited by the length of the shortest RT clustering.

edit: min max values from legacy sstables need to be discarded, otherwise open ended RTs (ie: DELETE WHERE c < 100) would still have this problem.


was (Author: bdeggleston):
|[3.0|https://github.com/bdeggleston/cassandra/tree/14861-3.0]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14861-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/14861-3.11]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14861-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/14861-trunk]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/14861-trunk]|

This adds a minor sstable version to 3.x and changes 2 behaviors. First, when reading metadata for pre-md sstables, only the first clustering value is loaded into the min/max values and the rest are discarded. When writing new sstables, the size of the min/max values written are limited by the length of the shortest RT clustering.

> sstable min/max metadata can cause data loss
> --------------------------------------------
>
>                 Key: CASSANDRA-14861
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14861
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Major
>             Fix For: 3.0.18, 3.11.4, 4.0
>
>
> There’s a bug in the way we filter sstables in the read path that can cause sstables containing relevant range tombstones to be excluded from reads. This can cause data resurrection for an individual read, and if compaction timing is right, permanent resurrection via read repair. 
> We track the min and max clustering values when writing an sstable so we can avoid reading from sstables that don’t contain the clustering values we’re looking for in a given read. The min max for each clustering column are updated for each row / RT marker we write. In the case of range tombstones markers though, we only update the min max for the clustering values they contain, which is almost never the full set of clustering values. This leaves a min/max that are above/below (respectively) the real ranges covered by the range tombstone contained in the sstable.
> For instance, assume we’re writing an sstable for a table with 3 clustering values. The current min clustering is 5:6:7. We write an RT marker for a range tombstone that deletes any row with the value 4 in the first clustering value so the open marker is [4:]. This would make the new min clustering 4:6:7 when it should really be 4:. If we do a read for clustering values of 4:5 and lower, we’ll exclude this sstable and it’s range tombstone, resurrecting any data there that this tombstone would have deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org