You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andy Tolbert (JIRA)" <ji...@apache.org> on 2015/12/07 04:33:10 UTC
[jira] [Created] (CASSANDRA-10822) SSTable data loss when upgrading
with row tombstone present
Andy Tolbert created CASSANDRA-10822:
----------------------------------------
Summary: SSTable data loss when upgrading with row tombstone present
Key: CASSANDRA-10822
URL: https://issues.apache.org/jira/browse/CASSANDRA-10822
Project: Cassandra
Issue Type: Bug
Reporter: Andy Tolbert
I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also cassandra-3.0 branch) where subsequent rows were lost within a partition where there is a row tombstone present.
Here's a scenario that reproduces the issue.
Using ccm create a single node cluster at 2.1.11:
{{ccm create -n 1 -v 2.1.11 -s financial}}
Run the following queries to create schema, populate some data and then delete some data for november:
{noformat}
drop keyspace if exists financial;
create keyspace if not exists financial with replication = {'class': 'SimpleStrategy', 'replication_factor' : 1 };
create table if not exists financial.symbol_history (
symbol text,
name text static,
year int,
month int,
day int,
volume bigint,
close double,
open double,
low double,
high double,
primary key((symbol, year), month, day)
) with CLUSTERING ORDER BY (month desc, day desc);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 1, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 2, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 3, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 4, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 5, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 6, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 7, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 8, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 9, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 10, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 11, 1, 100);
insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 12, 1, 100);
delete from financial.symbol_history where symbol='CORP' and year = 2004 and month=11;
{noformat}
Flush and run sstable2json on the sole Data.db file:
{noformat}
ccm node1 flush
sstable2json /path/to/file.db
{noformat}
The output should look like the following:
{code:json}
[
{"key": "CORP:2004",
"cells": [["::name","MegaCorp",1449457517033030],
["12:1:","",1449457517033030],
["12:1:volume","100",1449457517033030],
["11:_","11:!",1449457564983269,"t",1449457564],
["10:1:","",1449457516313738],
["10:1:volume","100",1449457516313738],
["9:1:","",1449457516310205],
["9:1:volume","100",1449457516310205],
["8:1:","",1449457516235664],
["8:1:volume","100",1449457516235664],
["7:1:","",1449457516233535],
["7:1:volume","100",1449457516233535],
["6:1:","",1449457516231458],
["6:1:volume","100",1449457516231458],
["5:1:","",1449457516228307],
["5:1:volume","100",1449457516228307],
["4:1:","",1449457516225415],
["4:1:volume","100",1449457516225415],
["3:1:","",1449457516222811],
["3:1:volume","100",1449457516222811],
["2:1:","",1449457516220301],
["2:1:volume","100",1449457516220301],
["1:1:","",1449457516210758],
["1:1:volume","100",1449457516210758]]}
]
{code:json}
Prepare for upgrade
{noformat}
ccm node1 nodetool snapshot financial
ccm node1 nodetool drain
ccm node1 stop
{noformat}
Upgrade to cassandra-3.0 and start the node
{noformat}
ccm node1 setdir -v git:cassandra-3.0
ccm node1 start
{noformat}
Run command in cqlsh and observe only 1 row is returned! It appears that all data following november is gone.
{noformat}
cqlsh> select * from financial.symbol_history;
symbol | year | month | day | name | close | high | low | open | volume
--------+------+-------+-----+----------+-------+------+------+------+--------
CORP | 2004 | 12 | 1 | MegaCorp | null | null | null | null | 100
{noformat}
Upgrade sstables and query again and you'll observe the same problem.
{noformat}
ccm node1 nodetool upgradesstables financial
{noformat}
I modified the 2.2 version of sstable2json so that it works with 3.0 (couldn't help myself :)), and observed 2 RangeTombstoneBoundMarker occurrences for 1 delete and the rest of the data missing.
{code:json}
[
{
"key": "CORP:2004",
"static": {
"cells": {
["name","MegaCorp",1449457517033030]
}
},
"rows": [
{
"clustering": {"month": "12", "day": "1"},
"cells": {
["volume","100",1449457517033030]
}
},
{
"tombstone": ["11:*",1449457564983269,"t",1449457564]
},
{
"tombstone": ["11:*",1449457564983269,"t",1449457564]
}
]
}
]
{code:json}
I'm not sure why this is happening, but I should point out that I'm using static columns here and that I'm using reverse order for my clustering, so maybe that makes a difference. I'll try without static columns / regular ordering to see if that makes a difference and update the ticket.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)