You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonas Borgström <jo...@borgstrom.se> on 2012/03/28 10:23:46 UTC

sstable2json and resurrected rows

Hi all,

I've noticed a change in behavior between 0.8.10 and 1.0.8 when it comes 
to sstable2json output and major compactions. Is this a bug or intended 
behavior?

With 1.0.8:

create keyspace ks;
use ks;
create column family foo;
set foo[1][1] = 1;
nodetool -h localhost flush
sstable2json foo-hc-1-Data.db =>
{
"01": [["01","01",1332920802272000]]
}
del foo[1];
set foo[2][2] = 2;
nodetool -h localhost flush
sstable2json foo-hc-2-Data.db =>
{
"01": [],
"02": [["02","02",1332920843090000]]
}
nodetool -h localhost compact ks foo

So far so good. But now I expect the resulting sstable to look like 
foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the deleted 
foo[1] has been resurrected (foo[1] is still deleted when using the 
thrift api):

sstable2json foo-hc-3-Data.db =>
{
"01": [["01","01",1332920802272000]],
"02": [["02","02",1332920843090000]]
}

So why is the full foo[1] row included in the sstable2json output and 
not just a tombstone?

This is both a wast of disk space and makes it impossible to trust the 
sstable2json output.

/ Jonas

Re: sstable2json and resurrected rows

Posted by Jonas Borgström <jo...@borgstrom.se>.
On 2012-03-31 08:45 , Zhu Han wrote:
> Did you hit the bug here?
>
> https://issues.apache.org/jira/browse/CASSANDRA-4054

Yes looks like it. But what confuses me most is not the sstable2json bug 
but why the major compaction does not replace the deleted row data with 
a tombstone.

Is that a bug or a feature?

To me it just looks like a wast of disk space...

/ Jonas

>
> best regards,
>
> 坚果云 <https://jianguopuzi.com/>, 最简捷易用的云存储
> 无限空间, 文件同步, 备份和分享!
>
>
> 2012/3/30 Jonas Borgström <jonas@borgstrom.se <ma...@borgstrom.se>>
>
>     Let me rephrase my question:
>
>     Is it true that deleted rows will still be present in the sstable
>     after a major compaction with 1.0.8 (not just tombstones)?
>
>     Or did I mess up my test below?
>
>     / Jonas
>
>
>
>     On 2012-03-28 10:23 , Jonas Borgström wrote:
>
>         Hi all,
>
>         I've noticed a change in behavior between 0.8.10 and 1.0.8 when
>         it comes
>         to sstable2json output and major compactions. Is this a bug or
>         intended
>         behavior?
>
>         With 1.0.8:
>
>         create keyspace ks;
>         use ks;
>         create column family foo;
>         set foo[1][1] = 1;
>         nodetool -h localhost flush
>         sstable2json foo-hc-1-Data.db =>
>         {
>         "01": [["01","01",1332920802272000]]
>         }
>         del foo[1];
>         set foo[2][2] = 2;
>         nodetool -h localhost flush
>         sstable2json foo-hc-2-Data.db =>
>         {
>         "01": [],
>         "02": [["02","02",1332920843090000]]
>         }
>         nodetool -h localhost compact ks foo
>
>         So far so good. But now I expect the resulting sstable to look like
>         foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the
>         deleted
>         foo[1] has been resurrected (foo[1] is still deleted when using the
>         thrift api):
>
>         sstable2json foo-hc-3-Data.db =>
>         {
>         "01": [["01","01",1332920802272000]]__,
>         "02": [["02","02",1332920843090000]]
>         }
>
>         So why is the full foo[1] row included in the sstable2json
>         output and
>         not just a tombstone?
>
>         This is both a wast of disk space and makes it impossible to
>         trust the
>         sstable2json output.
>
>         / Jonas
>
>
>


Re: sstable2json and resurrected rows

Posted by Zhu Han <sc...@gmail.com>.
Did you hit the bug here?

https://issues.apache.org/jira/browse/CASSANDRA-4054

best regards,

坚果云 <https://jianguopuzi.com/>, 最简捷易用的云存储
无限空间, 文件同步, 备份和分享!


2012/3/30 Jonas Borgström <jo...@borgstrom.se>

> Let me rephrase my question:
>
> Is it true that deleted rows will still be present in the sstable after a
> major compaction with 1.0.8 (not just tombstones)?
>
> Or did I mess up my test below?
>
> / Jonas
>
>
>
> On 2012-03-28 10:23 , Jonas Borgström wrote:
>
>> Hi all,
>>
>> I've noticed a change in behavior between 0.8.10 and 1.0.8 when it comes
>> to sstable2json output and major compactions. Is this a bug or intended
>> behavior?
>>
>> With 1.0.8:
>>
>> create keyspace ks;
>> use ks;
>> create column family foo;
>> set foo[1][1] = 1;
>> nodetool -h localhost flush
>> sstable2json foo-hc-1-Data.db =>
>> {
>> "01": [["01","01",1332920802272000]]
>> }
>> del foo[1];
>> set foo[2][2] = 2;
>> nodetool -h localhost flush
>> sstable2json foo-hc-2-Data.db =>
>> {
>> "01": [],
>> "02": [["02","02",1332920843090000]]
>> }
>> nodetool -h localhost compact ks foo
>>
>> So far so good. But now I expect the resulting sstable to look like
>> foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the deleted
>> foo[1] has been resurrected (foo[1] is still deleted when using the
>> thrift api):
>>
>> sstable2json foo-hc-3-Data.db =>
>> {
>> "01": [["01","01",1332920802272000]]**,
>> "02": [["02","02",1332920843090000]]
>> }
>>
>> So why is the full foo[1] row included in the sstable2json output and
>> not just a tombstone?
>>
>> This is both a wast of disk space and makes it impossible to trust the
>> sstable2json output.
>>
>> / Jonas
>>
>
>

Re: sstable2json and resurrected rows

Posted by Jonas Borgström <jo...@borgstrom.se>.
Let me rephrase my question:

Is it true that deleted rows will still be present in the sstable after 
a major compaction with 1.0.8 (not just tombstones)?

Or did I mess up my test below?

/ Jonas


On 2012-03-28 10:23 , Jonas Borgström wrote:
> Hi all,
>
> I've noticed a change in behavior between 0.8.10 and 1.0.8 when it comes
> to sstable2json output and major compactions. Is this a bug or intended
> behavior?
>
> With 1.0.8:
>
> create keyspace ks;
> use ks;
> create column family foo;
> set foo[1][1] = 1;
> nodetool -h localhost flush
> sstable2json foo-hc-1-Data.db =>
> {
> "01": [["01","01",1332920802272000]]
> }
> del foo[1];
> set foo[2][2] = 2;
> nodetool -h localhost flush
> sstable2json foo-hc-2-Data.db =>
> {
> "01": [],
> "02": [["02","02",1332920843090000]]
> }
> nodetool -h localhost compact ks foo
>
> So far so good. But now I expect the resulting sstable to look like
> foo-hc-2 (the way 0.8.10 behaves) but instead it looks like the deleted
> foo[1] has been resurrected (foo[1] is still deleted when using the
> thrift api):
>
> sstable2json foo-hc-3-Data.db =>
> {
> "01": [["01","01",1332920802272000]],
> "02": [["02","02",1332920843090000]]
> }
>
> So why is the full foo[1] row included in the sstable2json output and
> not just a tombstone?
>
> This is both a wast of disk space and makes it impossible to trust the
> sstable2json output.
>
> / Jonas