You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Takahiko Kawasaki <da...@gmail.com> on 2012/08/14 16:54:14 UTC

Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Hello,

I have a problem where 'put' with timestamp does not succeed.
I did the following at the HBase shell.

(1) Do 'put' with timestamp.
      # 'scan' shows 1 row.

(2) Delete the row by 'deleteall'.
      # 'scan' says "0 row(s)".

(3) Do 'put' again by the same command line as (1).
      # 'scan' says "0 row(s)" ! Why?

(4) Increment the timestamp value by 1 and try 'put' again.
      # 'scan' still says "0 row(s)"! Why?

The command lines I actually typed are as follows and the attached
file is the output from the command lines.

scan 'test-table'
put 'test-table', 'row3', 'test-family', 'value'
scan 'test-table'
deleteall 'test-table', 'row3'
scan 'test-table'
put 'test-table', 'row3', 'test-family', 'value'
scan 'test-table'
deleteall 'test-table', 'row3'
scan 'test-table'
put 'test-table', 'row4', 'test-family', 'value', 10
scan 'test-table'
deleteall 'test-table', 'row4'
scan 'test-table'
put 'test-table', 'row4', 'test-family', 'value', 10
scan 'test-table'
put 'test-table', 'row4', 'test-family', 'value', 10
scan 'test-table'
quit

Is this behavior the HBase specification?

My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.

Could anyone give me any insight, please?

Best Regards,
Takahiko Kawasaki

Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Posted by Stack <st...@duboce.net>.
On Wed, Aug 15, 2012 at 9:13 AM, lars hofhansl <lh...@yahoo.com> wrote:
> I also have a short blog post about this here: http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html
>

I added link to this discussion into the Versioning section of our
reference guide (thanks all above).
St.Ack

Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Posted by lars hofhansl <lh...@yahoo.com>.
I also have a short blog post about this here: http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html



________________________________
 From: Harsh J <ha...@cloudera.com>
To: user@hbase.apache.org 
Sent: Wednesday, August 15, 2012 5:50 AM
Subject: Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
 
Yonghu,

You are correct at that. Until a major_compact finishes, inserting
with old timestamps will never show. Inserted old timestamped values
before a major compact but after a delete will all go away.

That is why I had to put in the data into the table _after_ the
major_compact ran, in that shell output I'd sent.

On Wed, Aug 15, 2012 at 5:18 PM, yonghu <yo...@gmail.com> wrote:
> Hi Harsh,
>
> I have a question of your description. The deleted tag masks the new
> inserted value with old timestamp, that's why the new inserted data
> can'be seen. But after major compaction, this new value will be seen
> again. So, the question is that how the deletion really executes. In
> my understanding, the deletion will delete all the data values which
> TSs are less equal than the TS of the deleted tag. So, if you insert a
> value with old TS after you insert a deleted tag, it should also be
> deleted at the  compaction time. For example, if I first insert
> (k1,t1), and then delete  (k1,t1) with deleted tag which TS is greater
> than t1, then reinsert (k1,t1) again. So, at the compaction time, two
> (k1,t1) should be deleted.
>
> wish your response!
>
> Yong
>
>
>
> On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <da...@gmail.com> wrote:
>> Dear Harsh,
>>
>> Thank you very much for your detailed explanation. I could understand
>> what had been going on during my put/scan/delete operations. I'll modify
>> my application and test programs taking the timestamp implementation
>> into consideration.
>>
>> Best Regards,
>> Takahiko Kawasaki
>>
>> 2012/8/15 Harsh J <ha...@cloudera.com>
>>
>>> When a Delete occurs, an insert is made with the timestamp being the
>>> current time (to indicate it is the latest version). Hence, when you
>>> insert a value after this with an _older_ timestamp, it is not taken
>>> in as the latest version, and is hence ignored when scanning. This is
>>> why you do not see the data.
>>>
>>> If you instead insert this after a compaction has fully run on this
>>> store file, then your value will indeed get shown after insert, cause
>>> at that moment there wouldn't exist such a row with a latest timestamp
>>> at all.
>>>
>>> hbase(main):060:0> flush 'test-table'
>>> 0 row(s) in 0.1020 seconds
>>>
>>> hbase(main):061:0> major_compact 'test-table'
>>> 0 row(s) in 0.0400 seconds
>>>
>>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
>>> 0 row(s) in 0.0230 seconds
>>>
>>> hbase(main):063:0> scan 'test-table'
>>> ROW                   COLUMN+CELL
>>>  row4                 column=test-family:, timestamp=10, value=value
>>> 1 row(s) in 0.0060 seconds
>>>
>>> I suppose this is why it is recommended not to mess with the
>>> timestamps manually, and instead just rely on versions.
>>>
>>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <da...@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a problem where 'put' with timestamp does not succeed.
>>> > I did the following at the HBase shell.
>>> >
>>> > (1) Do 'put' with timestamp.
>>> >       # 'scan' shows 1 row.
>>> >
>>> > (2) Delete the row by 'deleteall'.
>>> >       # 'scan' says "0 row(s)".
>>> >
>>> > (3) Do 'put' again by the same command line as (1).
>>> >       # 'scan' says "0 row(s)" ! Why?
>>> >
>>> > (4) Increment the timestamp value by 1 and try 'put' again.
>>> >       # 'scan' still says "0 row(s)"! Why?
>>> >
>>> > The command lines I actually typed are as follows and the attached
>>> > file is the output from the command lines.
>>> >
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row4'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > quit
>>> >
>>> > Is this behavior the HBase specification?
>>> >
>>> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
>>> >
>>> > Could anyone give me any insight, please?
>>> >
>>> > Best Regards,
>>> > Takahiko Kawasaki
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>



-- 
Harsh J

Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Posted by Harsh J <ha...@cloudera.com>.
Yonghu,

You are correct at that. Until a major_compact finishes, inserting
with old timestamps will never show. Inserted old timestamped values
before a major compact but after a delete will all go away.

That is why I had to put in the data into the table _after_ the
major_compact ran, in that shell output I'd sent.

On Wed, Aug 15, 2012 at 5:18 PM, yonghu <yo...@gmail.com> wrote:
> Hi Harsh,
>
> I have a question of your description. The deleted tag masks the new
> inserted value with old timestamp, that's why the new inserted data
> can'be seen. But after major compaction, this new value will be seen
> again. So, the question is that how the deletion really executes. In
> my understanding, the deletion will delete all the data values which
> TSs are less equal than the TS of the deleted tag. So, if you insert a
> value with old TS after you insert a deleted tag, it should also be
> deleted at the  compaction time. For example, if I first insert
> (k1,t1), and then delete  (k1,t1) with deleted tag which TS is greater
> than t1, then reinsert (k1,t1) again. So, at the compaction time, two
> (k1,t1) should be deleted.
>
> wish your response!
>
> Yong
>
>
>
> On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <da...@gmail.com> wrote:
>> Dear Harsh,
>>
>> Thank you very much for your detailed explanation. I could understand
>> what had been going on during my put/scan/delete operations. I'll modify
>> my application and test programs taking the timestamp implementation
>> into consideration.
>>
>> Best Regards,
>> Takahiko Kawasaki
>>
>> 2012/8/15 Harsh J <ha...@cloudera.com>
>>
>>> When a Delete occurs, an insert is made with the timestamp being the
>>> current time (to indicate it is the latest version). Hence, when you
>>> insert a value after this with an _older_ timestamp, it is not taken
>>> in as the latest version, and is hence ignored when scanning. This is
>>> why you do not see the data.
>>>
>>> If you instead insert this after a compaction has fully run on this
>>> store file, then your value will indeed get shown after insert, cause
>>> at that moment there wouldn't exist such a row with a latest timestamp
>>> at all.
>>>
>>> hbase(main):060:0> flush 'test-table'
>>> 0 row(s) in 0.1020 seconds
>>>
>>> hbase(main):061:0> major_compact 'test-table'
>>> 0 row(s) in 0.0400 seconds
>>>
>>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
>>> 0 row(s) in 0.0230 seconds
>>>
>>> hbase(main):063:0> scan 'test-table'
>>> ROW                   COLUMN+CELL
>>>  row4                 column=test-family:, timestamp=10, value=value
>>> 1 row(s) in 0.0060 seconds
>>>
>>> I suppose this is why it is recommended not to mess with the
>>> timestamps manually, and instead just rely on versions.
>>>
>>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <da...@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a problem where 'put' with timestamp does not succeed.
>>> > I did the following at the HBase shell.
>>> >
>>> > (1) Do 'put' with timestamp.
>>> >       # 'scan' shows 1 row.
>>> >
>>> > (2) Delete the row by 'deleteall'.
>>> >       # 'scan' says "0 row(s)".
>>> >
>>> > (3) Do 'put' again by the same command line as (1).
>>> >       # 'scan' says "0 row(s)" ! Why?
>>> >
>>> > (4) Increment the timestamp value by 1 and try 'put' again.
>>> >       # 'scan' still says "0 row(s)"! Why?
>>> >
>>> > The command lines I actually typed are as follows and the attached
>>> > file is the output from the command lines.
>>> >
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row4'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > quit
>>> >
>>> > Is this behavior the HBase specification?
>>> >
>>> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
>>> >
>>> > Could anyone give me any insight, please?
>>> >
>>> > Best Regards,
>>> > Takahiko Kawasaki
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>



-- 
Harsh J

Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Posted by yonghu <yo...@gmail.com>.
Hi Harsh,

I have a question of your description. The deleted tag masks the new
inserted value with old timestamp, that's why the new inserted data
can'be seen. But after major compaction, this new value will be seen
again. So, the question is that how the deletion really executes. In
my understanding, the deletion will delete all the data values which
TSs are less equal than the TS of the deleted tag. So, if you insert a
value with old TS after you insert a deleted tag, it should also be
deleted at the  compaction time. For example, if I first insert
(k1,t1), and then delete  (k1,t1) with deleted tag which TS is greater
than t1, then reinsert (k1,t1) again. So, at the compaction time, two
(k1,t1) should be deleted.

wish your response!

Yong



On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <da...@gmail.com> wrote:
> Dear Harsh,
>
> Thank you very much for your detailed explanation. I could understand
> what had been going on during my put/scan/delete operations. I'll modify
> my application and test programs taking the timestamp implementation
> into consideration.
>
> Best Regards,
> Takahiko Kawasaki
>
> 2012/8/15 Harsh J <ha...@cloudera.com>
>
>> When a Delete occurs, an insert is made with the timestamp being the
>> current time (to indicate it is the latest version). Hence, when you
>> insert a value after this with an _older_ timestamp, it is not taken
>> in as the latest version, and is hence ignored when scanning. This is
>> why you do not see the data.
>>
>> If you instead insert this after a compaction has fully run on this
>> store file, then your value will indeed get shown after insert, cause
>> at that moment there wouldn't exist such a row with a latest timestamp
>> at all.
>>
>> hbase(main):060:0> flush 'test-table'
>> 0 row(s) in 0.1020 seconds
>>
>> hbase(main):061:0> major_compact 'test-table'
>> 0 row(s) in 0.0400 seconds
>>
>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
>> 0 row(s) in 0.0230 seconds
>>
>> hbase(main):063:0> scan 'test-table'
>> ROW                   COLUMN+CELL
>>  row4                 column=test-family:, timestamp=10, value=value
>> 1 row(s) in 0.0060 seconds
>>
>> I suppose this is why it is recommended not to mess with the
>> timestamps manually, and instead just rely on versions.
>>
>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <da...@gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I have a problem where 'put' with timestamp does not succeed.
>> > I did the following at the HBase shell.
>> >
>> > (1) Do 'put' with timestamp.
>> >       # 'scan' shows 1 row.
>> >
>> > (2) Delete the row by 'deleteall'.
>> >       # 'scan' says "0 row(s)".
>> >
>> > (3) Do 'put' again by the same command line as (1).
>> >       # 'scan' says "0 row(s)" ! Why?
>> >
>> > (4) Increment the timestamp value by 1 and try 'put' again.
>> >       # 'scan' still says "0 row(s)"! Why?
>> >
>> > The command lines I actually typed are as follows and the attached
>> > file is the output from the command lines.
>> >
>> > scan 'test-table'
>> > put 'test-table', 'row3', 'test-family', 'value'
>> > scan 'test-table'
>> > deleteall 'test-table', 'row3'
>> > scan 'test-table'
>> > put 'test-table', 'row3', 'test-family', 'value'
>> > scan 'test-table'
>> > deleteall 'test-table', 'row3'
>> > scan 'test-table'
>> > put 'test-table', 'row4', 'test-family', 'value', 10
>> > scan 'test-table'
>> > deleteall 'test-table', 'row4'
>> > scan 'test-table'
>> > put 'test-table', 'row4', 'test-family', 'value', 10
>> > scan 'test-table'
>> > put 'test-table', 'row4', 'test-family', 'value', 10
>> > scan 'test-table'
>> > quit
>> >
>> > Is this behavior the HBase specification?
>> >
>> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
>> >
>> > Could anyone give me any insight, please?
>> >
>> > Best Regards,
>> > Takahiko Kawasaki
>>
>>
>>
>> --
>> Harsh J
>>

Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Posted by Takahiko Kawasaki <da...@gmail.com>.
Dear Harsh,

Thank you very much for your detailed explanation. I could understand
what had been going on during my put/scan/delete operations. I'll modify
my application and test programs taking the timestamp implementation
into consideration.

Best Regards,
Takahiko Kawasaki

2012/8/15 Harsh J <ha...@cloudera.com>

> When a Delete occurs, an insert is made with the timestamp being the
> current time (to indicate it is the latest version). Hence, when you
> insert a value after this with an _older_ timestamp, it is not taken
> in as the latest version, and is hence ignored when scanning. This is
> why you do not see the data.
>
> If you instead insert this after a compaction has fully run on this
> store file, then your value will indeed get shown after insert, cause
> at that moment there wouldn't exist such a row with a latest timestamp
> at all.
>
> hbase(main):060:0> flush 'test-table'
> 0 row(s) in 0.1020 seconds
>
> hbase(main):061:0> major_compact 'test-table'
> 0 row(s) in 0.0400 seconds
>
> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
> 0 row(s) in 0.0230 seconds
>
> hbase(main):063:0> scan 'test-table'
> ROW                   COLUMN+CELL
>  row4                 column=test-family:, timestamp=10, value=value
> 1 row(s) in 0.0060 seconds
>
> I suppose this is why it is recommended not to mess with the
> timestamps manually, and instead just rely on versions.
>
> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <da...@gmail.com>
> wrote:
> > Hello,
> >
> > I have a problem where 'put' with timestamp does not succeed.
> > I did the following at the HBase shell.
> >
> > (1) Do 'put' with timestamp.
> >       # 'scan' shows 1 row.
> >
> > (2) Delete the row by 'deleteall'.
> >       # 'scan' says "0 row(s)".
> >
> > (3) Do 'put' again by the same command line as (1).
> >       # 'scan' says "0 row(s)" ! Why?
> >
> > (4) Increment the timestamp value by 1 and try 'put' again.
> >       # 'scan' still says "0 row(s)"! Why?
> >
> > The command lines I actually typed are as follows and the attached
> > file is the output from the command lines.
> >
> > scan 'test-table'
> > put 'test-table', 'row3', 'test-family', 'value'
> > scan 'test-table'
> > deleteall 'test-table', 'row3'
> > scan 'test-table'
> > put 'test-table', 'row3', 'test-family', 'value'
> > scan 'test-table'
> > deleteall 'test-table', 'row3'
> > scan 'test-table'
> > put 'test-table', 'row4', 'test-family', 'value', 10
> > scan 'test-table'
> > deleteall 'test-table', 'row4'
> > scan 'test-table'
> > put 'test-table', 'row4', 'test-family', 'value', 10
> > scan 'test-table'
> > put 'test-table', 'row4', 'test-family', 'value', 10
> > scan 'test-table'
> > quit
> >
> > Is this behavior the HBase specification?
> >
> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
> >
> > Could anyone give me any insight, please?
> >
> > Best Regards,
> > Takahiko Kawasaki
>
>
>
> --
> Harsh J
>

Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails

Posted by Harsh J <ha...@cloudera.com>.
When a Delete occurs, an insert is made with the timestamp being the
current time (to indicate it is the latest version). Hence, when you
insert a value after this with an _older_ timestamp, it is not taken
in as the latest version, and is hence ignored when scanning. This is
why you do not see the data.

If you instead insert this after a compaction has fully run on this
store file, then your value will indeed get shown after insert, cause
at that moment there wouldn't exist such a row with a latest timestamp
at all.

hbase(main):060:0> flush 'test-table'
0 row(s) in 0.1020 seconds

hbase(main):061:0> major_compact 'test-table'
0 row(s) in 0.0400 seconds

hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
0 row(s) in 0.0230 seconds

hbase(main):063:0> scan 'test-table'
ROW                   COLUMN+CELL
 row4                 column=test-family:, timestamp=10, value=value
1 row(s) in 0.0060 seconds

I suppose this is why it is recommended not to mess with the
timestamps manually, and instead just rely on versions.

On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <da...@gmail.com> wrote:
> Hello,
>
> I have a problem where 'put' with timestamp does not succeed.
> I did the following at the HBase shell.
>
> (1) Do 'put' with timestamp.
>       # 'scan' shows 1 row.
>
> (2) Delete the row by 'deleteall'.
>       # 'scan' says "0 row(s)".
>
> (3) Do 'put' again by the same command line as (1).
>       # 'scan' says "0 row(s)" ! Why?
>
> (4) Increment the timestamp value by 1 and try 'put' again.
>       # 'scan' still says "0 row(s)"! Why?
>
> The command lines I actually typed are as follows and the attached
> file is the output from the command lines.
>
> scan 'test-table'
> put 'test-table', 'row3', 'test-family', 'value'
> scan 'test-table'
> deleteall 'test-table', 'row3'
> scan 'test-table'
> put 'test-table', 'row3', 'test-family', 'value'
> scan 'test-table'
> deleteall 'test-table', 'row3'
> scan 'test-table'
> put 'test-table', 'row4', 'test-family', 'value', 10
> scan 'test-table'
> deleteall 'test-table', 'row4'
> scan 'test-table'
> put 'test-table', 'row4', 'test-family', 'value', 10
> scan 'test-table'
> put 'test-table', 'row4', 'test-family', 'value', 10
> scan 'test-table'
> quit
>
> Is this behavior the HBase specification?
>
> My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
>
> Could anyone give me any insight, please?
>
> Best Regards,
> Takahiko Kawasaki



-- 
Harsh J