You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Buttler, David" <bu...@llnl.gov> on 2011/02/01 01:35:02 UTC

RE: Delete reveals older version of a column even when VERSIONS=1

The way I understand it is that old versions do not actually disappear until a compaction occurs.  A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits.

Dave



-----Original Message-----
From: Mike Percy [mailto:mpercy@yahoo-inc.com] 
Sent: Friday, January 28, 2011 6:10 PM
To: user@hbase.apache.org
Subject: Re: Delete reveals older version of a column even when VERSIONS=1

Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?

Thanks,
Mike

On Jan 28, 2011, at 5:47 PM, Ryan Rawson wrote:

> I would call it 'a surprising, perhaps unexpected consequence of our
> storage model'.
> 
> There are 2 types of deletes in hbase, you are doing type (a) "delete
> a single version", but you probably want type (b) "delete all versions
> in this column"
> 
> 
> 
> On Fri, Jan 28, 2011 at 5:43 PM, Mike Percy <mp...@yahoo-inc.com> wrote:
>> Hi folks,
>> I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
>> 
>> Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
>> 
>> Thanks,
>> Mike
>> 
>> hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
>> 0 row(s) in 0.0110 seconds
>> hbase(main):007:0> get 'table', 'row'
>> COLUMN                       CELL
>>  family:qual                   timestamp=1296264772717, value=1
>> 1 row(s) in 0.0080 seconds
>> hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
>> 0 row(s) in 0.0020 seconds
>> hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
>> 0 row(s) in 0.0020 seconds
>> hbase(main):010:0> get 'table', 'row'
>> COLUMN                       CELL
>>  family:qual                   timestamp=1296264797169, value=3
>> 1 row(s) in 0.0030 seconds
>> hbase(main):011:0> delete 'table', 'row', 'family:qual'
>> 0 row(s) in 0.0040 seconds
>> hbase(main):012:0> get 'table', 'row'
>> COLUMN                       CELL
>>  family:qual                   timestamp=1296264795365, value=2
>> 1 row(s) in 0.0630 seconds
>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>> 0 row(s) in 0.0360 seconds
>> hbase(main):014:0> get 'table', 'row'
>> COLUMN                       CELL
>>  family:qual                   timestamp=1296264772717, value=1
>> 1 row(s) in 0.0030 seconds
>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>> 0 row(s) in 0.0360 seconds
>> hbase(main):016:0> get 'table', 'row'
>> COLUMN                       CELL
>> 0 row(s) in 0.0030 seconds
>> 
>> 


Re: Delete reveals older version of a column even when VERSIONS=1

Posted by Mike Percy <mp...@yahoo-inc.com>.
Hi David and Ryan,
That is very interesting! This makes things much clearer.

Thanks for your help!
Mike

On Jan 31, 2011, at 4:40 PM, Ryan Rawson wrote:

> You are correct, since we do not prune extra version except during
> these major compactions that happen about once a day, if you delete a
> recent version and it exposes an older version, you will see this.
> 
> I might consider this a mis-feature.  I would encourage you to
> consider using the Delete.deleteColumns() call found here:
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumns(byte[],
> byte[])
> 
> and NOT USE:
> 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumn(byte[],
> byte[])
> 
> Note the only difference between these is the plurality of 'column'.
> 
> I hope this helps!
> -ryan
> 
> On Mon, Jan 31, 2011 at 4:35 PM, Buttler, David <bu...@llnl.gov> wrote:
>> The way I understand it is that old versions do not actually disappear until a compaction occurs.  A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits.
>> 
>> Dave
>> 
>> 
>> 
>> -----Original Message-----
>> From: Mike Percy [mailto:mpercy@yahoo-inc.com]
>> Sent: Friday, January 28, 2011 6:10 PM
>> To: user@hbase.apache.org
>> Subject: Re: Delete reveals older version of a column even when VERSIONS=1
>> 
>> Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?
>> 
>> Thanks,
>> Mike
>> 
>> On Jan 28, 2011, at 5:47 PM, Ryan Rawson wrote:
>> 
>>> I would call it 'a surprising, perhaps unexpected consequence of our
>>> storage model'.
>>> 
>>> There are 2 types of deletes in hbase, you are doing type (a) "delete
>>> a single version", but you probably want type (b) "delete all versions
>>> in this column"
>>> 
>>> 
>>> 
>>> On Fri, Jan 28, 2011 at 5:43 PM, Mike Percy <mp...@yahoo-inc.com> wrote:
>>>> Hi folks,
>>>> I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
>>>> 
>>>> Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
>>>> 
>>>> Thanks,
>>>> Mike
>>>> 
>>>> hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
>>>> 0 row(s) in 0.0110 seconds
>>>> hbase(main):007:0> get 'table', 'row'
>>>> COLUMN                       CELL
>>>>  family:qual                   timestamp=1296264772717, value=1
>>>> 1 row(s) in 0.0080 seconds
>>>> hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
>>>> 0 row(s) in 0.0020 seconds
>>>> hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
>>>> 0 row(s) in 0.0020 seconds
>>>> hbase(main):010:0> get 'table', 'row'
>>>> COLUMN                       CELL
>>>>  family:qual                   timestamp=1296264797169, value=3
>>>> 1 row(s) in 0.0030 seconds
>>>> hbase(main):011:0> delete 'table', 'row', 'family:qual'
>>>> 0 row(s) in 0.0040 seconds
>>>> hbase(main):012:0> get 'table', 'row'
>>>> COLUMN                       CELL
>>>>  family:qual                   timestamp=1296264795365, value=2
>>>> 1 row(s) in 0.0630 seconds
>>>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>>>> 0 row(s) in 0.0360 seconds
>>>> hbase(main):014:0> get 'table', 'row'
>>>> COLUMN                       CELL
>>>>  family:qual                   timestamp=1296264772717, value=1
>>>> 1 row(s) in 0.0030 seconds
>>>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>>>> 0 row(s) in 0.0360 seconds
>>>> hbase(main):016:0> get 'table', 'row'
>>>> COLUMN                       CELL
>>>> 0 row(s) in 0.0030 seconds
>>>> 
>>>> 
>> 
>> 


Re: Delete reveals older version of a column even when VERSIONS=1

Posted by Ryan Rawson <ry...@gmail.com>.
You are correct, since we do not prune extra version except during
these major compactions that happen about once a day, if you delete a
recent version and it exposes an older version, you will see this.

I might consider this a mis-feature.  I would encourage you to
consider using the Delete.deleteColumns() call found here:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumns(byte[],
byte[])

and NOT USE:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumn(byte[],
byte[])

Note the only difference between these is the plurality of 'column'.

I hope this helps!
-ryan

On Mon, Jan 31, 2011 at 4:35 PM, Buttler, David <bu...@llnl.gov> wrote:
> The way I understand it is that old versions do not actually disappear until a compaction occurs.  A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits.
>
> Dave
>
>
>
> -----Original Message-----
> From: Mike Percy [mailto:mpercy@yahoo-inc.com]
> Sent: Friday, January 28, 2011 6:10 PM
> To: user@hbase.apache.org
> Subject: Re: Delete reveals older version of a column even when VERSIONS=1
>
> Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?
>
> Thanks,
> Mike
>
> On Jan 28, 2011, at 5:47 PM, Ryan Rawson wrote:
>
>> I would call it 'a surprising, perhaps unexpected consequence of our
>> storage model'.
>>
>> There are 2 types of deletes in hbase, you are doing type (a) "delete
>> a single version", but you probably want type (b) "delete all versions
>> in this column"
>>
>>
>>
>> On Fri, Jan 28, 2011 at 5:43 PM, Mike Percy <mp...@yahoo-inc.com> wrote:
>>> Hi folks,
>>> I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
>>>
>>> Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
>>>
>>> Thanks,
>>> Mike
>>>
>>> hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
>>> 0 row(s) in 0.0110 seconds
>>> hbase(main):007:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264772717, value=1
>>> 1 row(s) in 0.0080 seconds
>>> hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
>>> 0 row(s) in 0.0020 seconds
>>> hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
>>> 0 row(s) in 0.0020 seconds
>>> hbase(main):010:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264797169, value=3
>>> 1 row(s) in 0.0030 seconds
>>> hbase(main):011:0> delete 'table', 'row', 'family:qual'
>>> 0 row(s) in 0.0040 seconds
>>> hbase(main):012:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264795365, value=2
>>> 1 row(s) in 0.0630 seconds
>>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>>> 0 row(s) in 0.0360 seconds
>>> hbase(main):014:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264772717, value=1
>>> 1 row(s) in 0.0030 seconds
>>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>>> 0 row(s) in 0.0360 seconds
>>> hbase(main):016:0> get 'table', 'row'
>>> COLUMN                       CELL
>>> 0 row(s) in 0.0030 seconds
>>>
>>>
>
>