You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonas Borgström <jo...@borgstrom.se> on 2012/03/28 10:23:46 UTC
sstable2json and resurrected rows
Hi all,
I've noticed a change in behavior between 0.8.10 and 1.0.8 when it comes
to sstable2json output and major compactions. Is this a bug or intended
behavior?
With 1.0.8:
create keyspace ks;
use ks;
create column family foo;
set foo[1][1] = 1;
nodetool -h localhost flush
sstable2json foo-hc-1-Data.db =>
{
"01": [["01","01",1332920802272000]]
}
del foo[1];
set foo[2][2] = 2;
nodetool -h localhost flush
sstable2json foo-hc-2-Data.db =>
{
"01": [],
"02": [["02","02",1332920843090000]]
}
nodetool -h localhost compact ks foo
So far so good. But now I expect the resulting sstable to look like
foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the deleted
foo[1] has been resurrected (foo[1] is still deleted when using the
thrift api):
sstable2json foo-hc-3-Data.db =>
{
"01": [["01","01",1332920802272000]],
"02": [["02","02",1332920843090000]]
}
So why is the full foo[1] row included in the sstable2json output and
not just a tombstone?
This is both a wast of disk space and makes it impossible to trust the
sstable2json output.
/ Jonas
Re: sstable2json and resurrected rows
Posted by Jonas Borgström <jo...@borgstrom.se>.
On 2012-03-31 08:45 , Zhu Han wrote:
> Did you hit the bug here?
>
> https://issues.apache.org/jira/browse/CASSANDRA-4054
Yes looks like it. But what confuses me most is not the sstable2json bug
but why the major compaction does not replace the deleted row data with
a tombstone.
Is that a bug or a feature?
To me it just looks like a wast of disk space...
/ Jonas
>
> best regards,
>
> 坚果云 <https://jianguopuzi.com/>, 最简捷易用的云存储
> 无限空间, 文件同步, 备份和分享!
>
>
> 2012/3/30 Jonas Borgström <jonas@borgstrom.se <ma...@borgstrom.se>>
>
> Let me rephrase my question:
>
> Is it true that deleted rows will still be present in the sstable
> after a major compaction with 1.0.8 (not just tombstones)?
>
> Or did I mess up my test below?
>
> / Jonas
>
>
>
> On 2012-03-28 10:23 , Jonas Borgström wrote:
>
> Hi all,
>
> I've noticed a change in behavior between 0.8.10 and 1.0.8 when
> it comes
> to sstable2json output and major compactions. Is this a bug or
> intended
> behavior?
>
> With 1.0.8:
>
> create keyspace ks;
> use ks;
> create column family foo;
> set foo[1][1] = 1;
> nodetool -h localhost flush
> sstable2json foo-hc-1-Data.db =>
> {
> "01": [["01","01",1332920802272000]]
> }
> del foo[1];
> set foo[2][2] = 2;
> nodetool -h localhost flush
> sstable2json foo-hc-2-Data.db =>
> {
> "01": [],
> "02": [["02","02",1332920843090000]]
> }
> nodetool -h localhost compact ks foo
>
> So far so good. But now I expect the resulting sstable to look like
> foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the
> deleted
> foo[1] has been resurrected (foo[1] is still deleted when using the
> thrift api):
>
> sstable2json foo-hc-3-Data.db =>
> {
> "01": [["01","01",1332920802272000]]__,
> "02": [["02","02",1332920843090000]]
> }
>
> So why is the full foo[1] row included in the sstable2json
> output and
> not just a tombstone?
>
> This is both a wast of disk space and makes it impossible to
> trust the
> sstable2json output.
>
> / Jonas
>
>
>
Re: sstable2json and resurrected rows
Posted by Zhu Han <sc...@gmail.com>.
Did you hit the bug here?
https://issues.apache.org/jira/browse/CASSANDRA-4054
best regards,
坚果云 <https://jianguopuzi.com/>, 最简捷易用的云存储
无限空间, 文件同步, 备份和分享!
2012/3/30 Jonas Borgström <jo...@borgstrom.se>
> Let me rephrase my question:
>
> Is it true that deleted rows will still be present in the sstable after a
> major compaction with 1.0.8 (not just tombstones)?
>
> Or did I mess up my test below?
>
> / Jonas
>
>
>
> On 2012-03-28 10:23 , Jonas Borgström wrote:
>
>> Hi all,
>>
>> I've noticed a change in behavior between 0.8.10 and 1.0.8 when it comes
>> to sstable2json output and major compactions. Is this a bug or intended
>> behavior?
>>
>> With 1.0.8:
>>
>> create keyspace ks;
>> use ks;
>> create column family foo;
>> set foo[1][1] = 1;
>> nodetool -h localhost flush
>> sstable2json foo-hc-1-Data.db =>
>> {
>> "01": [["01","01",1332920802272000]]
>> }
>> del foo[1];
>> set foo[2][2] = 2;
>> nodetool -h localhost flush
>> sstable2json foo-hc-2-Data.db =>
>> {
>> "01": [],
>> "02": [["02","02",1332920843090000]]
>> }
>> nodetool -h localhost compact ks foo
>>
>> So far so good. But now I expect the resulting sstable to look like
>> foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the deleted
>> foo[1] has been resurrected (foo[1] is still deleted when using the
>> thrift api):
>>
>> sstable2json foo-hc-3-Data.db =>
>> {
>> "01": [["01","01",1332920802272000]]**,
>> "02": [["02","02",1332920843090000]]
>> }
>>
>> So why is the full foo[1] row included in the sstable2json output and
>> not just a tombstone?
>>
>> This is both a wast of disk space and makes it impossible to trust the
>> sstable2json output.
>>
>> / Jonas
>>
>
>
Re: sstable2json and resurrected rows
Posted by Jonas Borgström <jo...@borgstrom.se>.
Let me rephrase my question:
Is it true that deleted rows will still be present in the sstable after
a major compaction with 1.0.8 (not just tombstones)?
Or did I mess up my test below?
/ Jonas
On 2012-03-28 10:23 , Jonas Borgström wrote:
> Hi all,
>
> I've noticed a change in behavior between 0.8.10 and 1.0.8 when it comes
> to sstable2json output and major compactions. Is this a bug or intended
> behavior?
>
> With 1.0.8:
>
> create keyspace ks;
> use ks;
> create column family foo;
> set foo[1][1] = 1;
> nodetool -h localhost flush
> sstable2json foo-hc-1-Data.db =>
> {
> "01": [["01","01",1332920802272000]]
> }
> del foo[1];
> set foo[2][2] = 2;
> nodetool -h localhost flush
> sstable2json foo-hc-2-Data.db =>
> {
> "01": [],
> "02": [["02","02",1332920843090000]]
> }
> nodetool -h localhost compact ks foo
>
> So far so good. But now I expect the resulting sstable to look like
> foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the deleted
> foo[1] has been resurrected (foo[1] is still deleted when using the
> thrift api):
>
> sstable2json foo-hc-3-Data.db =>
> {
> "01": [["01","01",1332920802272000]],
> "02": [["02","02",1332920843090000]]
> }
>
> So why is the full foo[1] row included in the sstable2json output and
> not just a tombstone?
>
> This is both a wast of disk space and makes it impossible to trust the
> sstable2json output.
>
> / Jonas