You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Yi Liang <wh...@gmail.com> on 2011/11/24 08:38:18 UTC

delete operation with timestamp

We're using hbase-0.90.3 with thrift client, and have encountered some
problems when we want to delete one specific version of a cell.

First, there's no corresponding thrift api for Delete#deleteColumn(byte []
family, byte [] qualifier, long timestamp). Instead, deleteColumns is
supported in mutateRowTs.  But what we want is deleteColumn as we need to
keep the older versions. IMO, we should implement mutateRowTs
with deleteColumn, rather than deleteColumns. The hbase shell's delete
command has the same problem.

Second, we find we can't reinsert any older cell if we have deleted that
cell with deleteColumns. For example:
hbase(main):007:0> scan 'test3'
ROW                                           COLUMN+CELL
0 row(s) in 0.0110 seconds

hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
0 row(s) in 0.0100 seconds

hbase(main):009:0> scan 'test3'
ROW                                           COLUMN+CELL
 r1                                           column=f1:c1,
timestamp=1315550678308, value=old
1 row(s) in 0.0290 seconds

hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
0 row(s) in 0.0090 seconds

hbase(main):013:0> scan 'test3'
ROW                                           COLUMN+CELL
 r1                                           column=f1:c1,
timestamp=1322119570316, value=new
1 row(s) in 0.0140 seconds

hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
0 row(s) in 0.0130 seconds

hbase(main):015:0> scan 'test3'
ROW                                           COLUMN+CELL
0 row(s) in 0.0120 seconds

hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
0 row(s) in 0.0090 seconds

hbase(main):017:0> scan 'test3'
ROW                                           COLUMN+CELL
0 row(s) in 0.0110 seconds

There's no error message when we reinsert the old version, so we think it
has succeeded, but actually it's not. It looks like a bug.

What's your opinion?

Thanks,
Yi

Re: delete operation with timestamp

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Hi Lars,
>>You could look at the code :)
Did exactly that. Just wanted to be sure that I am not missing any insight.

>>Typically you won't add many columns with different time stamps as part
of the same put... You are right, though, it is not strictly needed.
Understood now.

Thanks for bearing with me Lars.

-Shrijeet


On Mon, Nov 28, 2011 at 8:16 PM, lars hofhansl <lh...@yahoo.com> wrote:

> You could look at the code :)
>
>
> The time stamps that count are the ones on the KeyValues maintained in the
> put's familyMap (the set of KVs mapped to CFs).
>
> In fact the put's TS is just a convenience used as default TS for the
> added KVs, it is not used at the server.
> Typically you won't add many columns with different time stamps as part of
> the same put... You are right, though, it is not strictly needed.
>
>
> ----- Original Message -----
> From: Shrijeet Paliwal <sh...@rocketfuel.com>
> To: lars hofhansl <lh...@yahoo.com>
> Cc: "user@hbase.apache.org" <us...@hbase.apache.org>
> Sent: Monday, November 28, 2011 5:49 PM
> Subject: Re: delete operation with timestamp
>
> Lars,
> Thank you for writing. It does make sense.
>
> >>So if you trigger a Put operations from the client and you change (say) 3
> columns, the server will insert 3 KeyValues into the Memstore all of which
> carry
> >>the TS of the Put.
> What if I construct the Put object by calling three calls to 'add' with my
> own timestamp:
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add(byte[]
> ,
> byte[], long, byte[])
> In such a case the the keyvalue list members will have different TS than
> the TS of the put. What will be the meaning of TS of Put on server side
> now?
>
> >>Having the TS per cell (or KeyValue) is necessary to enforce ACID
> guarantees, which state that what you retrieve with Get is a set of
> KeyValues such as this
> >>combination of versions of KeyValues for this row existed together at a
> point. (need to remember here that multiple Put operations could insert
> different columns for the same rowKey).
> Yes this totally makes sense. And my question is around this, what is the
> need to maintain TS at put at all. Even if client does not want to specify
> a timestamp , the burdon of including the latest timestamp can be passed to
> KeyValue object.
>
> -Shrijeet
>
> On Mon, Nov 28, 2011 at 5:33 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > Hi Shrijeet,
> >
> > you have to distinguish between the storage format and the client side
> > objects. KeyValue is an outlier (of sorts) as it is used on both server
> and
> > client).
> > Timestamps are per cell (KeyValue).
> >
> >
> > A Put object is something you create on the client to describe a put
> > operation to be performed at the server.
> > The server will take the information from the Put and write the necessary
> > KeyValues into the Memstore (which will eventually be flushed to disk).
> >
> > So if you trigger a Put operations from the client and you change (say) 3
> > columns, the server will insert 3 KeyValues into the Memstore all of
> which
> > carry
> > the TS of the Put.
> >
> > Having the TS per cell (or KeyValue) is necessary to enforce ACID
> > guarantees, which state that what you retrieve with Get is a set of
> > KeyValues such as this
> > combination of versions of KeyValues for this row existed together at a
> > point. (need to remember here that multiple Put operations could insert
> > different columns for the same rowKey).
> >
> >
> > Makes sense?
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Shrijeet Paliwal <sh...@rocketfuel.com>
> > To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> > Cc:
> > Sent: Monday, November 28, 2011 4:31 PM
> > Subject: Re: delete operation with timestamp
> >
> > Slightly offtopic, sorry.
> >
> > While we have attention on timestamps may I ask why HBase maintains a
> > timestamp at row level (initialized with LATEST_TIMESTAMP)?
> > In other words timestamp has meaning in context of a cell and HBase
> > keeps it at that level, then why keep one TS at row level. Going
> > further, what is the meaning of
> > a timestamp 'ts' associated with Put object if all the KeyValue
> > objects associated have timestamp different than 'ts'.
> >
> > Was the motivation behind this, to allow client not specify timestamp
> > (in turn assume they meant latest ts)?
> >
> > I am looking at line 5 of this function http://pastebin.com/ik1Dxgqq
> > which is serializing timestamp at row level and at lines 18-21 which
> > are serializing timestamp at cell level.
> >
> > Thanks.
> >
> >
> > On Mon, Nov 28, 2011 at 3:56 PM, lars hofhansl <lh...@yahoo.com>
> > wrote:
> > > Hi Yi,
> > > the reason is that nothing is ever changed in-place in HBase, only new
> > files are created (with the exception of the WAL, which is appended to,
> > > and some special scenario like atomic increment and atomic appends,
> > where older version of the cells are removed from the memstore).
> > >
> > > That caters very well to the performance characteristics of the
> > underlying distributed file system (HDFS).
> > >
> > >
> > > Consequently deleted rows are not actually deleted right away, we just
> > record the fact the rows should not be visible anymore and can eventually
> > be removed.
> > > The actual removal happens during the next compaction when new files
> are
> > created.
> > >
> > > Sometimes that does lead to unexpected behaviors such as the one you
> > describe below.
> > >
> > > In the trunk version of HBase I introduced the possibility to perform
> > time-range queries that can "peek" behind delete markers to retrieve
> cells
> > that are marked as deleted. (HBASE-4536)
> > >
> > > -- Lars
> > >
> > >
> > > ----- Original Message -----
> > > From: Yi Liang <wh...@gmail.com>
> > > To: user@hbase.apache.org
> > > Cc:
> > > Sent: Thursday, November 24, 2011 10:11 PM
> > > Subject: Re: delete operation with timestamp
> > >
> > > Thanks Daniel for your explanation. But still curious why we do such
> > > design, it's unexpected for me.
> > >
> > > Also, this behavior of deleteColumns make delete operation not very
> user
> > > friendly, why not use deleteColumn instead in hbase shell and thrift
> > client?
> > >
> > > Thanks,
> > > Yi
> > >
> > > 2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>
> > >
> > >>
> > >> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
> > >>
> > >> > We're using hbase-0.90.3 with thrift client, and have encountered
> some
> > >> > problems when we want to delete one specific version of a cell.
> > >> >
> > >> > First, there's no corresponding thrift api for
> > Delete#deleteColumn(byte
> > >> []
> > >> > family, byte [] qualifier, long timestamp). Instead, deleteColumns
> is
> > >> > supported in mutateRowTs.  But what we want is deleteColumn as we
> > need to
> > >> > keep the older versions. IMO, we should implement mutateRowTs
> > >> > with deleteColumn, rather than deleteColumns. The hbase shell's
> delete
> > >> > command has the same problem.
> > >> >
> > >> > Second, we find we can't reinsert any older cell if we have deleted
> > that
> > >> > cell with deleteColumns. For example:
> > >> > hbase(main):007:0> scan 'test3'
> > >> > ROW                                           COLUMN+CELL
> > >> > 0 row(s) in 0.0110 seconds
> > >> >
> > >> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> > >> > 0 row(s) in 0.0100 seconds
> > >> >
> > >> > hbase(main):009:0> scan 'test3'
> > >> > ROW                                           COLUMN+CELL
> > >> > r1                                           column=f1:c1,
> > >> > timestamp=1315550678308, value=old
> > >> > 1 row(s) in 0.0290 seconds
> > >> >
> > >> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
> > >> > 0 row(s) in 0.0090 seconds
> > >> >
> > >> > hbase(main):013:0> scan 'test3'
> > >> > ROW                                           COLUMN+CELL
> > >> > r1                                           column=f1:c1,
> > >> > timestamp=1322119570316, value=new
> > >> > 1 row(s) in 0.0140 seconds
> > >> >
> > >> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
> > >> > 0 row(s) in 0.0130 seconds
> > >> >
> > >> > hbase(main):015:0> scan 'test3'
> > >> > ROW                                           COLUMN+CELL
> > >> > 0 row(s) in 0.0120 seconds
> > >> >
> > >> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> > >> > 0 row(s) in 0.0090 seconds
> > >> >
> > >> > hbase(main):017:0> scan 'test3'
> > >> > ROW                                           COLUMN+CELL
> > >> > 0 row(s) in 0.0110 seconds
> > >> >
> > >> > There's no error message when we reinsert the old version, so we
> > think it
> > >> > has succeeded, but actually it's not. It looks like a bug.
> > >> >
> > >> > What's your opinion?
> > >> >
> > >>
> > >> Hi,
> > >>
> > >> The second point is not a bug, it's how HBase is designed. Any delete
> > >> (except deleteColumn) inserts a tombstone marker which masks any older
> > >> value, so even if you insert later an older value it will be masked by
> > the
> > >> tombstone. You can see some nice examples here:
> > >> http://outerthought.org/blog/417-ot.html
> > >>
> > >> There is also a new feature in trunk that allows you to retrieve
> masked
> > >> values through a "raw scan" or a get with a timeRange that excludes
> the
> > >> delete: https://issues.apache.org/jira/browse/HBASE-4536
> > >>
> > >> Daniel
> > >>
> > >> > Thanks,
> > >> > Yi
> > >>
> > >>
> > >
> > >
> >
> >
>
>

Re: delete operation with timestamp

Posted by lars hofhansl <lh...@yahoo.com>.
You could look at the code :)


The time stamps that count are the ones on the KeyValues maintained in the put's familyMap (the set of KVs mapped to CFs).

In fact the put's TS is just a convenience used as default TS for the added KVs, it is not used at the server.
Typically you won't add many columns with different time stamps as part of the same put... You are right, though, it is not strictly needed.


----- Original Message -----
From: Shrijeet Paliwal <sh...@rocketfuel.com>
To: lars hofhansl <lh...@yahoo.com>
Cc: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Monday, November 28, 2011 5:49 PM
Subject: Re: delete operation with timestamp

Lars,
Thank you for writing. It does make sense.

>>So if you trigger a Put operations from the client and you change (say) 3
columns, the server will insert 3 KeyValues into the Memstore all of which
carry
>>the TS of the Put.
What if I construct the Put object by calling three calls to 'add' with my
own timestamp:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add(byte[],
byte[], long, byte[])
In such a case the the keyvalue list members will have different TS than
the TS of the put. What will be the meaning of TS of Put on server side now?

>>Having the TS per cell (or KeyValue) is necessary to enforce ACID
guarantees, which state that what you retrieve with Get is a set of
KeyValues such as this
>>combination of versions of KeyValues for this row existed together at a
point. (need to remember here that multiple Put operations could insert
different columns for the same rowKey).
Yes this totally makes sense. And my question is around this, what is the
need to maintain TS at put at all. Even if client does not want to specify
a timestamp , the burdon of including the latest timestamp can be passed to
KeyValue object.

-Shrijeet

On Mon, Nov 28, 2011 at 5:33 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Hi Shrijeet,
>
> you have to distinguish between the storage format and the client side
> objects. KeyValue is an outlier (of sorts) as it is used on both server and
> client).
> Timestamps are per cell (KeyValue).
>
>
> A Put object is something you create on the client to describe a put
> operation to be performed at the server.
> The server will take the information from the Put and write the necessary
> KeyValues into the Memstore (which will eventually be flushed to disk).
>
> So if you trigger a Put operations from the client and you change (say) 3
> columns, the server will insert 3 KeyValues into the Memstore all of which
> carry
> the TS of the Put.
>
> Having the TS per cell (or KeyValue) is necessary to enforce ACID
> guarantees, which state that what you retrieve with Get is a set of
> KeyValues such as this
> combination of versions of KeyValues for this row existed together at a
> point. (need to remember here that multiple Put operations could insert
> different columns for the same rowKey).
>
>
> Makes sense?
>
> -- Lars
>
>
> ----- Original Message -----
> From: Shrijeet Paliwal <sh...@rocketfuel.com>
> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Monday, November 28, 2011 4:31 PM
> Subject: Re: delete operation with timestamp
>
> Slightly offtopic, sorry.
>
> While we have attention on timestamps may I ask why HBase maintains a
> timestamp at row level (initialized with LATEST_TIMESTAMP)?
> In other words timestamp has meaning in context of a cell and HBase
> keeps it at that level, then why keep one TS at row level. Going
> further, what is the meaning of
> a timestamp 'ts' associated with Put object if all the KeyValue
> objects associated have timestamp different than 'ts'.
>
> Was the motivation behind this, to allow client not specify timestamp
> (in turn assume they meant latest ts)?
>
> I am looking at line 5 of this function http://pastebin.com/ik1Dxgqq
> which is serializing timestamp at row level and at lines 18-21 which
> are serializing timestamp at cell level.
>
> Thanks.
>
>
> On Mon, Nov 28, 2011 at 3:56 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> > Hi Yi,
> > the reason is that nothing is ever changed in-place in HBase, only new
> files are created (with the exception of the WAL, which is appended to,
> > and some special scenario like atomic increment and atomic appends,
> where older version of the cells are removed from the memstore).
> >
> > That caters very well to the performance characteristics of the
> underlying distributed file system (HDFS).
> >
> >
> > Consequently deleted rows are not actually deleted right away, we just
> record the fact the rows should not be visible anymore and can eventually
> be removed.
> > The actual removal happens during the next compaction when new files are
> created.
> >
> > Sometimes that does lead to unexpected behaviors such as the one you
> describe below.
> >
> > In the trunk version of HBase I introduced the possibility to perform
> time-range queries that can "peek" behind delete markers to retrieve cells
> that are marked as deleted. (HBASE-4536)
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Yi Liang <wh...@gmail.com>
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Thursday, November 24, 2011 10:11 PM
> > Subject: Re: delete operation with timestamp
> >
> > Thanks Daniel for your explanation. But still curious why we do such
> > design, it's unexpected for me.
> >
> > Also, this behavior of deleteColumns make delete operation not very user
> > friendly, why not use deleteColumn instead in hbase shell and thrift
> client?
> >
> > Thanks,
> > Yi
> >
> > 2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>
> >
> >>
> >> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
> >>
> >> > We're using hbase-0.90.3 with thrift client, and have encountered some
> >> > problems when we want to delete one specific version of a cell.
> >> >
> >> > First, there's no corresponding thrift api for
> Delete#deleteColumn(byte
> >> []
> >> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
> >> > supported in mutateRowTs.  But what we want is deleteColumn as we
> need to
> >> > keep the older versions. IMO, we should implement mutateRowTs
> >> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
> >> > command has the same problem.
> >> >
> >> > Second, we find we can't reinsert any older cell if we have deleted
> that
> >> > cell with deleteColumns. For example:
> >> > hbase(main):007:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > 0 row(s) in 0.0110 seconds
> >> >
> >> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> >> > 0 row(s) in 0.0100 seconds
> >> >
> >> > hbase(main):009:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > r1                                           column=f1:c1,
> >> > timestamp=1315550678308, value=old
> >> > 1 row(s) in 0.0290 seconds
> >> >
> >> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
> >> > 0 row(s) in 0.0090 seconds
> >> >
> >> > hbase(main):013:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > r1                                           column=f1:c1,
> >> > timestamp=1322119570316, value=new
> >> > 1 row(s) in 0.0140 seconds
> >> >
> >> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
> >> > 0 row(s) in 0.0130 seconds
> >> >
> >> > hbase(main):015:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > 0 row(s) in 0.0120 seconds
> >> >
> >> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> >> > 0 row(s) in 0.0090 seconds
> >> >
> >> > hbase(main):017:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > 0 row(s) in 0.0110 seconds
> >> >
> >> > There's no error message when we reinsert the old version, so we
> think it
> >> > has succeeded, but actually it's not. It looks like a bug.
> >> >
> >> > What's your opinion?
> >> >
> >>
> >> Hi,
> >>
> >> The second point is not a bug, it's how HBase is designed. Any delete
> >> (except deleteColumn) inserts a tombstone marker which masks any older
> >> value, so even if you insert later an older value it will be masked by
> the
> >> tombstone. You can see some nice examples here:
> >> http://outerthought.org/blog/417-ot.html
> >>
> >> There is also a new feature in trunk that allows you to retrieve masked
> >> values through a "raw scan" or a get with a timeRange that excludes the
> >> delete: https://issues.apache.org/jira/browse/HBASE-4536
> >>
> >> Daniel
> >>
> >> > Thanks,
> >> > Yi
> >>
> >>
> >
> >
>
>


Re: delete operation with timestamp

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Lars,
Thank you for writing. It does make sense.

>>So if you trigger a Put operations from the client and you change (say) 3
columns, the server will insert 3 KeyValues into the Memstore all of which
carry
>>the TS of the Put.
What if I construct the Put object by calling three calls to 'add' with my
own timestamp:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add(byte[],
byte[], long, byte[])
In such a case the the keyvalue list members will have different TS than
the TS of the put. What will be the meaning of TS of Put on server side now?

>>Having the TS per cell (or KeyValue) is necessary to enforce ACID
guarantees, which state that what you retrieve with Get is a set of
KeyValues such as this
>>combination of versions of KeyValues for this row existed together at a
point. (need to remember here that multiple Put operations could insert
different columns for the same rowKey).
Yes this totally makes sense. And my question is around this, what is the
need to maintain TS at put at all. Even if client does not want to specify
a timestamp , the burdon of including the latest timestamp can be passed to
KeyValue object.

-Shrijeet

On Mon, Nov 28, 2011 at 5:33 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Hi Shrijeet,
>
> you have to distinguish between the storage format and the client side
> objects. KeyValue is an outlier (of sorts) as it is used on both server and
> client).
> Timestamps are per cell (KeyValue).
>
>
> A Put object is something you create on the client to describe a put
> operation to be performed at the server.
> The server will take the information from the Put and write the necessary
> KeyValues into the Memstore (which will eventually be flushed to disk).
>
> So if you trigger a Put operations from the client and you change (say) 3
> columns, the server will insert 3 KeyValues into the Memstore all of which
> carry
> the TS of the Put.
>
> Having the TS per cell (or KeyValue) is necessary to enforce ACID
> guarantees, which state that what you retrieve with Get is a set of
> KeyValues such as this
> combination of versions of KeyValues for this row existed together at a
> point. (need to remember here that multiple Put operations could insert
> different columns for the same rowKey).
>
>
> Makes sense?
>
> -- Lars
>
>
> ----- Original Message -----
> From: Shrijeet Paliwal <sh...@rocketfuel.com>
> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Monday, November 28, 2011 4:31 PM
> Subject: Re: delete operation with timestamp
>
> Slightly offtopic, sorry.
>
> While we have attention on timestamps may I ask why HBase maintains a
> timestamp at row level (initialized with LATEST_TIMESTAMP)?
> In other words timestamp has meaning in context of a cell and HBase
> keeps it at that level, then why keep one TS at row level. Going
> further, what is the meaning of
> a timestamp 'ts' associated with Put object if all the KeyValue
> objects associated have timestamp different than 'ts'.
>
> Was the motivation behind this, to allow client not specify timestamp
> (in turn assume they meant latest ts)?
>
> I am looking at line 5 of this function http://pastebin.com/ik1Dxgqq
> which is serializing timestamp at row level and at lines 18-21 which
> are serializing timestamp at cell level.
>
> Thanks.
>
>
> On Mon, Nov 28, 2011 at 3:56 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> > Hi Yi,
> > the reason is that nothing is ever changed in-place in HBase, only new
> files are created (with the exception of the WAL, which is appended to,
> > and some special scenario like atomic increment and atomic appends,
> where older version of the cells are removed from the memstore).
> >
> > That caters very well to the performance characteristics of the
> underlying distributed file system (HDFS).
> >
> >
> > Consequently deleted rows are not actually deleted right away, we just
> record the fact the rows should not be visible anymore and can eventually
> be removed.
> > The actual removal happens during the next compaction when new files are
> created.
> >
> > Sometimes that does lead to unexpected behaviors such as the one you
> describe below.
> >
> > In the trunk version of HBase I introduced the possibility to perform
> time-range queries that can "peek" behind delete markers to retrieve cells
> that are marked as deleted. (HBASE-4536)
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Yi Liang <wh...@gmail.com>
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Thursday, November 24, 2011 10:11 PM
> > Subject: Re: delete operation with timestamp
> >
> > Thanks Daniel for your explanation. But still curious why we do such
> > design, it's unexpected for me.
> >
> > Also, this behavior of deleteColumns make delete operation not very user
> > friendly, why not use deleteColumn instead in hbase shell and thrift
> client?
> >
> > Thanks,
> > Yi
> >
> > 2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>
> >
> >>
> >> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
> >>
> >> > We're using hbase-0.90.3 with thrift client, and have encountered some
> >> > problems when we want to delete one specific version of a cell.
> >> >
> >> > First, there's no corresponding thrift api for
> Delete#deleteColumn(byte
> >> []
> >> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
> >> > supported in mutateRowTs.  But what we want is deleteColumn as we
> need to
> >> > keep the older versions. IMO, we should implement mutateRowTs
> >> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
> >> > command has the same problem.
> >> >
> >> > Second, we find we can't reinsert any older cell if we have deleted
> that
> >> > cell with deleteColumns. For example:
> >> > hbase(main):007:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > 0 row(s) in 0.0110 seconds
> >> >
> >> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> >> > 0 row(s) in 0.0100 seconds
> >> >
> >> > hbase(main):009:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > r1                                           column=f1:c1,
> >> > timestamp=1315550678308, value=old
> >> > 1 row(s) in 0.0290 seconds
> >> >
> >> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
> >> > 0 row(s) in 0.0090 seconds
> >> >
> >> > hbase(main):013:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > r1                                           column=f1:c1,
> >> > timestamp=1322119570316, value=new
> >> > 1 row(s) in 0.0140 seconds
> >> >
> >> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
> >> > 0 row(s) in 0.0130 seconds
> >> >
> >> > hbase(main):015:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > 0 row(s) in 0.0120 seconds
> >> >
> >> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> >> > 0 row(s) in 0.0090 seconds
> >> >
> >> > hbase(main):017:0> scan 'test3'
> >> > ROW                                           COLUMN+CELL
> >> > 0 row(s) in 0.0110 seconds
> >> >
> >> > There's no error message when we reinsert the old version, so we
> think it
> >> > has succeeded, but actually it's not. It looks like a bug.
> >> >
> >> > What's your opinion?
> >> >
> >>
> >> Hi,
> >>
> >> The second point is not a bug, it's how HBase is designed. Any delete
> >> (except deleteColumn) inserts a tombstone marker which masks any older
> >> value, so even if you insert later an older value it will be masked by
> the
> >> tombstone. You can see some nice examples here:
> >> http://outerthought.org/blog/417-ot.html
> >>
> >> There is also a new feature in trunk that allows you to retrieve masked
> >> values through a "raw scan" or a get with a timeRange that excludes the
> >> delete: https://issues.apache.org/jira/browse/HBASE-4536
> >>
> >> Daniel
> >>
> >> > Thanks,
> >> > Yi
> >>
> >>
> >
> >
>
>

Re: delete operation with timestamp

Posted by lars hofhansl <lh...@yahoo.com>.
Hi Shrijeet,

you have to distinguish between the storage format and the client side objects. KeyValue is an outlier (of sorts) as it is used on both server and client).
Timestamps are per cell (KeyValue).


A Put object is something you create on the client to describe a put operation to be performed at the server.
The server will take the information from the Put and write the necessary KeyValues into the Memstore (which will eventually be flushed to disk).

So if you trigger a Put operations from the client and you change (say) 3 columns, the server will insert 3 KeyValues into the Memstore all of which carry
the TS of the Put.

Having the TS per cell (or KeyValue) is necessary to enforce ACID guarantees, which state that what you retrieve with Get is a set of KeyValues such as this
combination of versions of KeyValues for this row existed together at a point. (need to remember here that multiple Put operations could insert different columns for the same rowKey).


Makes sense?

-- Lars


----- Original Message -----
From: Shrijeet Paliwal <sh...@rocketfuel.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Monday, November 28, 2011 4:31 PM
Subject: Re: delete operation with timestamp

Slightly offtopic, sorry.

While we have attention on timestamps may I ask why HBase maintains a
timestamp at row level (initialized with LATEST_TIMESTAMP)?
In other words timestamp has meaning in context of a cell and HBase
keeps it at that level, then why keep one TS at row level. Going
further, what is the meaning of
a timestamp 'ts' associated with Put object if all the KeyValue
objects associated have timestamp different than 'ts'.

Was the motivation behind this, to allow client not specify timestamp
(in turn assume they meant latest ts)?

I am looking at line 5 of this function http://pastebin.com/ik1Dxgqq
which is serializing timestamp at row level and at lines 18-21 which
are serializing timestamp at cell level.

Thanks.


On Mon, Nov 28, 2011 at 3:56 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi Yi,
> the reason is that nothing is ever changed in-place in HBase, only new files are created (with the exception of the WAL, which is appended to,
> and some special scenario like atomic increment and atomic appends, where older version of the cells are removed from the memstore).
>
> That caters very well to the performance characteristics of the underlying distributed file system (HDFS).
>
>
> Consequently deleted rows are not actually deleted right away, we just record the fact the rows should not be visible anymore and can eventually be removed.
> The actual removal happens during the next compaction when new files are created.
>
> Sometimes that does lead to unexpected behaviors such as the one you describe below.
>
> In the trunk version of HBase I introduced the possibility to perform time-range queries that can "peek" behind delete markers to retrieve cells that are marked as deleted. (HBASE-4536)
>
> -- Lars
>
>
> ----- Original Message -----
> From: Yi Liang <wh...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Thursday, November 24, 2011 10:11 PM
> Subject: Re: delete operation with timestamp
>
> Thanks Daniel for your explanation. But still curious why we do such
> design, it's unexpected for me.
>
> Also, this behavior of deleteColumns make delete operation not very user
> friendly, why not use deleteColumn instead in hbase shell and thrift client?
>
> Thanks,
> Yi
>
> 2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>
>
>>
>> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
>>
>> > We're using hbase-0.90.3 with thrift client, and have encountered some
>> > problems when we want to delete one specific version of a cell.
>> >
>> > First, there's no corresponding thrift api for Delete#deleteColumn(byte
>> []
>> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
>> > supported in mutateRowTs.  But what we want is deleteColumn as we need to
>> > keep the older versions. IMO, we should implement mutateRowTs
>> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
>> > command has the same problem.
>> >
>> > Second, we find we can't reinsert any older cell if we have deleted that
>> > cell with deleteColumns. For example:
>> > hbase(main):007:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0110 seconds
>> >
>> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
>> > 0 row(s) in 0.0100 seconds
>> >
>> > hbase(main):009:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > r1                                           column=f1:c1,
>> > timestamp=1315550678308, value=old
>> > 1 row(s) in 0.0290 seconds
>> >
>> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
>> > 0 row(s) in 0.0090 seconds
>> >
>> > hbase(main):013:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > r1                                           column=f1:c1,
>> > timestamp=1322119570316, value=new
>> > 1 row(s) in 0.0140 seconds
>> >
>> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
>> > 0 row(s) in 0.0130 seconds
>> >
>> > hbase(main):015:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0120 seconds
>> >
>> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
>> > 0 row(s) in 0.0090 seconds
>> >
>> > hbase(main):017:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0110 seconds
>> >
>> > There's no error message when we reinsert the old version, so we think it
>> > has succeeded, but actually it's not. It looks like a bug.
>> >
>> > What's your opinion?
>> >
>>
>> Hi,
>>
>> The second point is not a bug, it's how HBase is designed. Any delete
>> (except deleteColumn) inserts a tombstone marker which masks any older
>> value, so even if you insert later an older value it will be masked by the
>> tombstone. You can see some nice examples here:
>> http://outerthought.org/blog/417-ot.html
>>
>> There is also a new feature in trunk that allows you to retrieve masked
>> values through a "raw scan" or a get with a timeRange that excludes the
>> delete: https://issues.apache.org/jira/browse/HBASE-4536
>>
>> Daniel
>>
>> > Thanks,
>> > Yi
>>
>>
>
>


Re: delete operation with timestamp

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Slightly offtopic, sorry.

While we have attention on timestamps may I ask why HBase maintains a
timestamp at row level (initialized with LATEST_TIMESTAMP)?
In other words timestamp has meaning in context of a cell and HBase
keeps it at that level, then why keep one TS at row level. Going
further, what is the meaning of
a timestamp 'ts' associated with Put object if all the KeyValue
objects associated have timestamp different than 'ts'.

Was the motivation behind this, to allow client not specify timestamp
(in turn assume they meant latest ts)?

I am looking at line 5 of this function http://pastebin.com/ik1Dxgqq
which is serializing timestamp at row level and at lines 18-21 which
are serializing timestamp at cell level.

Thanks.


On Mon, Nov 28, 2011 at 3:56 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi Yi,
> the reason is that nothing is ever changed in-place in HBase, only new files are created (with the exception of the WAL, which is appended to,
> and some special scenario like atomic increment and atomic appends, where older version of the cells are removed from the memstore).
>
> That caters very well to the performance characteristics of the underlying distributed file system (HDFS).
>
>
> Consequently deleted rows are not actually deleted right away, we just record the fact the rows should not be visible anymore and can eventually be removed.
> The actual removal happens during the next compaction when new files are created.
>
> Sometimes that does lead to unexpected behaviors such as the one you describe below.
>
> In the trunk version of HBase I introduced the possibility to perform time-range queries that can "peek" behind delete markers to retrieve cells that are marked as deleted. (HBASE-4536)
>
> -- Lars
>
>
> ----- Original Message -----
> From: Yi Liang <wh...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Thursday, November 24, 2011 10:11 PM
> Subject: Re: delete operation with timestamp
>
> Thanks Daniel for your explanation. But still curious why we do such
> design, it's unexpected for me.
>
> Also, this behavior of deleteColumns make delete operation not very user
> friendly, why not use deleteColumn instead in hbase shell and thrift client?
>
> Thanks,
> Yi
>
> 2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>
>
>>
>> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
>>
>> > We're using hbase-0.90.3 with thrift client, and have encountered some
>> > problems when we want to delete one specific version of a cell.
>> >
>> > First, there's no corresponding thrift api for Delete#deleteColumn(byte
>> []
>> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
>> > supported in mutateRowTs.  But what we want is deleteColumn as we need to
>> > keep the older versions. IMO, we should implement mutateRowTs
>> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
>> > command has the same problem.
>> >
>> > Second, we find we can't reinsert any older cell if we have deleted that
>> > cell with deleteColumns. For example:
>> > hbase(main):007:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0110 seconds
>> >
>> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
>> > 0 row(s) in 0.0100 seconds
>> >
>> > hbase(main):009:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > r1                                           column=f1:c1,
>> > timestamp=1315550678308, value=old
>> > 1 row(s) in 0.0290 seconds
>> >
>> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
>> > 0 row(s) in 0.0090 seconds
>> >
>> > hbase(main):013:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > r1                                           column=f1:c1,
>> > timestamp=1322119570316, value=new
>> > 1 row(s) in 0.0140 seconds
>> >
>> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
>> > 0 row(s) in 0.0130 seconds
>> >
>> > hbase(main):015:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0120 seconds
>> >
>> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
>> > 0 row(s) in 0.0090 seconds
>> >
>> > hbase(main):017:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0110 seconds
>> >
>> > There's no error message when we reinsert the old version, so we think it
>> > has succeeded, but actually it's not. It looks like a bug.
>> >
>> > What's your opinion?
>> >
>>
>> Hi,
>>
>> The second point is not a bug, it's how HBase is designed. Any delete
>> (except deleteColumn) inserts a tombstone marker which masks any older
>> value, so even if you insert later an older value it will be masked by the
>> tombstone. You can see some nice examples here:
>> http://outerthought.org/blog/417-ot.html
>>
>> There is also a new feature in trunk that allows you to retrieve masked
>> values through a "raw scan" or a get with a timeRange that excludes the
>> delete: https://issues.apache.org/jira/browse/HBASE-4536
>>
>> Daniel
>>
>> > Thanks,
>> > Yi
>>
>>
>
>

Re: delete operation with timestamp

Posted by lars hofhansl <lh...@yahoo.com>.
Hi Yi,
the reason is that nothing is ever changed in-place in HBase, only new files are created (with the exception of the WAL, which is appended to,
and some special scenario like atomic increment and atomic appends, where older version of the cells are removed from the memstore).

That caters very well to the performance characteristics of the underlying distributed file system (HDFS).


Consequently deleted rows are not actually deleted right away, we just record the fact the rows should not be visible anymore and can eventually be removed.
The actual removal happens during the next compaction when new files are created.

Sometimes that does lead to unexpected behaviors such as the one you describe below.

In the trunk version of HBase I introduced the possibility to perform time-range queries that can "peek" behind delete markers to retrieve cells that are marked as deleted. (HBASE-4536)

-- Lars


----- Original Message -----
From: Yi Liang <wh...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Thursday, November 24, 2011 10:11 PM
Subject: Re: delete operation with timestamp

Thanks Daniel for your explanation. But still curious why we do such
design, it's unexpected for me.

Also, this behavior of deleteColumns make delete operation not very user
friendly, why not use deleteColumn instead in hbase shell and thrift client?

Thanks,
Yi

2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>

>
> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
>
> > We're using hbase-0.90.3 with thrift client, and have encountered some
> > problems when we want to delete one specific version of a cell.
> >
> > First, there's no corresponding thrift api for Delete#deleteColumn(byte
> []
> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
> > supported in mutateRowTs.  But what we want is deleteColumn as we need to
> > keep the older versions. IMO, we should implement mutateRowTs
> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
> > command has the same problem.
> >
> > Second, we find we can't reinsert any older cell if we have deleted that
> > cell with deleteColumns. For example:
> > hbase(main):007:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > 0 row(s) in 0.0110 seconds
> >
> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> > 0 row(s) in 0.0100 seconds
> >
> > hbase(main):009:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > r1                                           column=f1:c1,
> > timestamp=1315550678308, value=old
> > 1 row(s) in 0.0290 seconds
> >
> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
> > 0 row(s) in 0.0090 seconds
> >
> > hbase(main):013:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > r1                                           column=f1:c1,
> > timestamp=1322119570316, value=new
> > 1 row(s) in 0.0140 seconds
> >
> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
> > 0 row(s) in 0.0130 seconds
> >
> > hbase(main):015:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > 0 row(s) in 0.0120 seconds
> >
> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> > 0 row(s) in 0.0090 seconds
> >
> > hbase(main):017:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > 0 row(s) in 0.0110 seconds
> >
> > There's no error message when we reinsert the old version, so we think it
> > has succeeded, but actually it's not. It looks like a bug.
> >
> > What's your opinion?
> >
>
> Hi,
>
> The second point is not a bug, it's how HBase is designed. Any delete
> (except deleteColumn) inserts a tombstone marker which masks any older
> value, so even if you insert later an older value it will be masked by the
> tombstone. You can see some nice examples here:
> http://outerthought.org/blog/417-ot.html
>
> There is also a new feature in trunk that allows you to retrieve masked
> values through a "raw scan" or a get with a timeRange that excludes the
> delete: https://issues.apache.org/jira/browse/HBASE-4536
>
> Daniel
>
> > Thanks,
> > Yi
>
>


Re: delete operation with timestamp

Posted by Yi Liang <wh...@gmail.com>.
Thanks Daniel for your explanation. But still curious why we do such
design, it's unexpected for me.

Also, this behavior of deleteColumns make delete operation not very user
friendly, why not use deleteColumn instead in hbase shell and thrift client?

Thanks,
Yi

2011/11/24 Daniel Gómez Ferro <da...@yahoo-inc.com>

>
> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
>
> > We're using hbase-0.90.3 with thrift client, and have encountered some
> > problems when we want to delete one specific version of a cell.
> >
> > First, there's no corresponding thrift api for Delete#deleteColumn(byte
> []
> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
> > supported in mutateRowTs.  But what we want is deleteColumn as we need to
> > keep the older versions. IMO, we should implement mutateRowTs
> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
> > command has the same problem.
> >
> > Second, we find we can't reinsert any older cell if we have deleted that
> > cell with deleteColumns. For example:
> > hbase(main):007:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > 0 row(s) in 0.0110 seconds
> >
> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> > 0 row(s) in 0.0100 seconds
> >
> > hbase(main):009:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > r1                                           column=f1:c1,
> > timestamp=1315550678308, value=old
> > 1 row(s) in 0.0290 seconds
> >
> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
> > 0 row(s) in 0.0090 seconds
> >
> > hbase(main):013:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > r1                                           column=f1:c1,
> > timestamp=1322119570316, value=new
> > 1 row(s) in 0.0140 seconds
> >
> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
> > 0 row(s) in 0.0130 seconds
> >
> > hbase(main):015:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > 0 row(s) in 0.0120 seconds
> >
> > hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> > 0 row(s) in 0.0090 seconds
> >
> > hbase(main):017:0> scan 'test3'
> > ROW                                           COLUMN+CELL
> > 0 row(s) in 0.0110 seconds
> >
> > There's no error message when we reinsert the old version, so we think it
> > has succeeded, but actually it's not. It looks like a bug.
> >
> > What's your opinion?
> >
>
> Hi,
>
> The second point is not a bug, it's how HBase is designed. Any delete
> (except deleteColumn) inserts a tombstone marker which masks any older
> value, so even if you insert later an older value it will be masked by the
> tombstone. You can see some nice examples here:
> http://outerthought.org/blog/417-ot.html
>
> There is also a new feature in trunk that allows you to retrieve masked
> values through a "raw scan" or a get with a timeRange that excludes the
> delete: https://issues.apache.org/jira/browse/HBASE-4536
>
> Daniel
>
> > Thanks,
> > Yi
>
>

Re: delete operation with timestamp

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.
On Nov 24, 2011, at 08:38 , Yi Liang wrote:

> We're using hbase-0.90.3 with thrift client, and have encountered some
> problems when we want to delete one specific version of a cell.
> 
> First, there's no corresponding thrift api for Delete#deleteColumn(byte []
> family, byte [] qualifier, long timestamp). Instead, deleteColumns is
> supported in mutateRowTs.  But what we want is deleteColumn as we need to
> keep the older versions. IMO, we should implement mutateRowTs
> with deleteColumn, rather than deleteColumns. The hbase shell's delete
> command has the same problem.
> 
> Second, we find we can't reinsert any older cell if we have deleted that
> cell with deleteColumns. For example:
> hbase(main):007:0> scan 'test3'
> ROW                                           COLUMN+CELL
> 0 row(s) in 0.0110 seconds
> 
> hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> 0 row(s) in 0.0100 seconds
> 
> hbase(main):009:0> scan 'test3'
> ROW                                           COLUMN+CELL
> r1                                           column=f1:c1,
> timestamp=1315550678308, value=old
> 1 row(s) in 0.0290 seconds
> 
> hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
> 0 row(s) in 0.0090 seconds
> 
> hbase(main):013:0> scan 'test3'
> ROW                                           COLUMN+CELL
> r1                                           column=f1:c1,
> timestamp=1322119570316, value=new
> 1 row(s) in 0.0140 seconds
> 
> hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
> 0 row(s) in 0.0130 seconds
> 
> hbase(main):015:0> scan 'test3'
> ROW                                           COLUMN+CELL
> 0 row(s) in 0.0120 seconds
> 
> hbase(main):016:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
> 0 row(s) in 0.0090 seconds
> 
> hbase(main):017:0> scan 'test3'
> ROW                                           COLUMN+CELL
> 0 row(s) in 0.0110 seconds
> 
> There's no error message when we reinsert the old version, so we think it
> has succeeded, but actually it's not. It looks like a bug.
> 
> What's your opinion?
> 

Hi,

The second point is not a bug, it's how HBase is designed. Any delete (except deleteColumn) inserts a tombstone marker which masks any older value, so even if you insert later an older value it will be masked by the tombstone. You can see some nice examples here: http://outerthought.org/blog/417-ot.html

There is also a new feature in trunk that allows you to retrieve masked values through a "raw scan" or a get with a timeRange that excludes the delete: https://issues.apache.org/jira/browse/HBASE-4536

Daniel

> Thanks,
> Yi