You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by yun peng <pe...@gmail.com> on 2012/10/21 22:53:24 UTC

How to config hbase0.94.2 to retain deleted data

Hi, All,
I want to retain all deleted key-value pairs in hbase. I have tried to
config HColumnDescript as follow to make it return deleted.

  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
    hcd.setKeepDeletedCells(true);
    hcd.setBlockCacheEnabled(false);
  }

However, it does not work for me, as when I issued a delete and then query
by an older timestamp, the old data does not show up.

hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
=> 99, VERSIONS => 4}
COLUMN                CELL

0 row(s) in 0.0040 seconds

hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
=> 100, VERSIONS => 4}
COLUMN                CELL

0 row(s) in 0.0050 seconds

hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
=> 101, VERSIONS => 4}
COLUMN                CELL

 cf:c1                timestamp=101, value=v2

1 row(s) in 0.0050 seconds

Note this is a new feature in 0.94.2
(HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
I did not find too many sample code online, so... any one here has
experience in using HBASE-4536. How should one config
hbase to enable this feature in hbase?

Thanks
Yun

Re: How to config hbase0.94.2 to retain deleted data

Posted by PG <pe...@gmail.com>.
The comment clear things up. Configuration through Java code is also working too. Thanks.
Yun

On Oct 22, 2012, at 12:34 AM, lars hofhansl <lh...@yahoo.com> wrote:

> There currently is not. You have setup your column families to support this (Note that you only need to do that once per column family!).
> This was done for flexibility, because in many cases only some of the tables need to retain deleted cells.
> 
> These blog posts might be helpful:
> http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html
> http://hadoop-hbase.blogspot.com/2011/12/hbase-data-rentention-options.html
> 
> -- Lars
> 
> 
> 
> 
> ----- Original Message -----
> From: yun peng <pe...@gmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Sunday, October 21, 2012 5:20 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> Hi, Lars, that is great point. It works if I update the
> tabledescriptor with the table disabled. It looks online updating
> table descriptor generally not working...
> 
> Besides, in addition to the java API, is there any xml knob (in
> hbase0942) that can config keepdeletedcells?
> 
> Thanks a lot.
> Yun
> 
> On Sun, Oct 21, 2012 at 7:04 PM, lars hofhansl <lh...@yahoo.com> wrote:
>> Not sure that you can change the Table or Column Descriptors this way through a coprocessor.
>> Did you try to create (or alter) the table such that keepDeleteCells is true:
>> 
>> hbase(main):026:0> create 'usertable', {NAME=>'cf', KEEP_DELETED_CELLS=>true}
>> 0 row(s) in 1.1660 seconds
>> 
>> hbase(main):027:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>> 0 row(s) in 0.0320 seconds
>> 
>> hbase(main):028:0> delete 'usertable', "key1", 'cf:c1', 100
>> 0 row(s) in 0.0050 seconds
>> 
>> hbase(main):029:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP=> 99, VERSIONS => 4}
>> COLUMN                CELL
>>   cf:c1                timestamp=99, value=v1
>> 1 row(s) in 0.0150 seconds
>> 
>> Let me know how this works for you (generally). This is a new feature I added to 0.94 to support true time-range queries.
>> 
>> -- Lars
>> 
>> 
>> ----- Original Message -----
>> From: yun peng <pe...@gmail.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Sunday, October 21, 2012 1:53 PM
>> Subject: How to config hbase0.94.2 to retain deleted data
>> 
>> Hi, All,
>> I want to retain all deleted key-value pairs in hbase. I have tried to
>> config HColumnDescript as follow to make it return deleted.
>> 
>>    public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>      HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>      HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>      hcd.setKeepDeletedCells(true);
>>      hcd.setBlockCacheEnabled(false);
>>    }
>> 
>> However, it does not work for me, as when I issued a delete and then query
>> by an older timestamp, the old data does not show up.
>> 
>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 99, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0040 seconds
>> 
>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 100, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0050 seconds
>> 
>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 101, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> cf:c1                timestamp=101, value=v2
>> 
>> 1 row(s) in 0.0050 seconds
>> 
>> Note this is a new feature in 0.94.2
>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>> I did not find too many sample code online, so... any one here has
>> experience in using HBASE-4536. How should one config
>> hbase to enable this feature in hbase?
>> 
>> Thanks
>> Yun
> 

Re: How to config hbase0.94.2 to retain deleted data

Posted by lars hofhansl <lh...@yahoo.com>.
There currently is not. You have setup your column families to support this (Note that you only need to do that once per column family!).
This was done for flexibility, because in many cases only some of the tables need to retain deleted cells.

These blog posts might be helpful:
http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html
http://hadoop-hbase.blogspot.com/2011/12/hbase-data-rentention-options.html

-- Lars




----- Original Message -----
From: yun peng <pe...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Sunday, October 21, 2012 5:20 PM
Subject: Re: How to config hbase0.94.2 to retain deleted data

Hi, Lars, that is great point. It works if I update the
tabledescriptor with the table disabled. It looks online updating
table descriptor generally not working...

Besides, in addition to the java API, is there any xml knob (in
hbase0942) that can config keepdeletedcells?

Thanks a lot.
Yun

On Sun, Oct 21, 2012 at 7:04 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Not sure that you can change the Table or Column Descriptors this way through a coprocessor.
> Did you try to create (or alter) the table such that keepDeleteCells is true:
>
> hbase(main):026:0> create 'usertable', {NAME=>'cf', KEEP_DELETED_CELLS=>true}
> 0 row(s) in 1.1660 seconds
>
> hbase(main):027:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> 0 row(s) in 0.0320 seconds
>
> hbase(main):028:0> delete 'usertable', "key1", 'cf:c1', 100
> 0 row(s) in 0.0050 seconds
>
> hbase(main):029:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP=> 99, VERSIONS => 4}
> COLUMN                CELL
>  cf:c1                timestamp=99, value=v1
> 1 row(s) in 0.0150 seconds
>
> Let me know how this works for you (generally). This is a new feature I added to 0.94 to support true time-range queries.
>
> -- Lars
>
>
> ----- Original Message -----
> From: yun peng <pe...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, October 21, 2012 1:53 PM
> Subject: How to config hbase0.94.2 to retain deleted data
>
> Hi, All,
> I want to retain all deleted key-value pairs in hbase. I have tried to
> config HColumnDescript as follow to make it return deleted.
>
>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>     hcd.setKeepDeletedCells(true);
>     hcd.setBlockCacheEnabled(false);
>   }
>
> However, it does not work for me, as when I issued a delete and then query
> by an older timestamp, the old data does not show up.
>
> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 99, VERSIONS => 4}
> COLUMN                CELL
>
> 0 row(s) in 0.0040 seconds
>
> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 100, VERSIONS => 4}
> COLUMN                CELL
>
> 0 row(s) in 0.0050 seconds
>
> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 101, VERSIONS => 4}
> COLUMN                CELL
>
> cf:c1                timestamp=101, value=v2
>
> 1 row(s) in 0.0050 seconds
>
> Note this is a new feature in 0.94.2
> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> I did not find too many sample code online, so... any one here has
> experience in using HBASE-4536. How should one config
> hbase to enable this feature in hbase?
>
> Thanks
> Yun
>


Re: How to config hbase0.94.2 to retain deleted data

Posted by yun peng <pe...@gmail.com>.
Hi, Lars, that is great point. It works if I update the
tabledescriptor with the table disabled. It looks online updating
table descriptor generally not working...

Besides, in addition to the java API, is there any xml knob (in
hbase0942) that can config keepdeletedcells?

Thanks a lot.
Yun

On Sun, Oct 21, 2012 at 7:04 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Not sure that you can change the Table or Column Descriptors this way through a coprocessor.
> Did you try to create (or alter) the table such that keepDeleteCells is true:
>
> hbase(main):026:0> create 'usertable', {NAME=>'cf', KEEP_DELETED_CELLS=>true}
> 0 row(s) in 1.1660 seconds
>
> hbase(main):027:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> 0 row(s) in 0.0320 seconds
>
> hbase(main):028:0> delete 'usertable', "key1", 'cf:c1', 100
> 0 row(s) in 0.0050 seconds
>
> hbase(main):029:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP=> 99, VERSIONS => 4}
> COLUMN                CELL
>  cf:c1                timestamp=99, value=v1
> 1 row(s) in 0.0150 seconds
>
> Let me know how this works for you (generally). This is a new feature I added to 0.94 to support true time-range queries.
>
> -- Lars
>
>
> ----- Original Message -----
> From: yun peng <pe...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, October 21, 2012 1:53 PM
> Subject: How to config hbase0.94.2 to retain deleted data
>
> Hi, All,
> I want to retain all deleted key-value pairs in hbase. I have tried to
> config HColumnDescript as follow to make it return deleted.
>
>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>     hcd.setKeepDeletedCells(true);
>     hcd.setBlockCacheEnabled(false);
>   }
>
> However, it does not work for me, as when I issued a delete and then query
> by an older timestamp, the old data does not show up.
>
> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 99, VERSIONS => 4}
> COLUMN                CELL
>
> 0 row(s) in 0.0040 seconds
>
> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 100, VERSIONS => 4}
> COLUMN                CELL
>
> 0 row(s) in 0.0050 seconds
>
> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 101, VERSIONS => 4}
> COLUMN                CELL
>
> cf:c1                timestamp=101, value=v2
>
> 1 row(s) in 0.0050 seconds
>
> Note this is a new feature in 0.94.2
> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> I did not find too many sample code online, so... any one here has
> experience in using HBASE-4536. How should one config
> hbase to enable this feature in hbase?
>
> Thanks
> Yun
>

Re: How to config hbase0.94.2 to retain deleted data

Posted by lars hofhansl <lh...@yahoo.com>.
Not sure that you can change the Table or Column Descriptors this way through a coprocessor.
Did you try to create (or alter) the table such that keepDeleteCells is true:

hbase(main):026:0> create 'usertable', {NAME=>'cf', KEEP_DELETED_CELLS=>true}
0 row(s) in 1.1660 seconds

hbase(main):027:0> put 'usertable', "key1", 'cf:c1', "v1", 99
0 row(s) in 0.0320 seconds

hbase(main):028:0> delete 'usertable', "key1", 'cf:c1', 100
0 row(s) in 0.0050 seconds

hbase(main):029:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP=> 99, VERSIONS => 4}
COLUMN                CELL                                                     
 cf:c1                timestamp=99, value=v1                                   
1 row(s) in 0.0150 seconds

Let me know how this works for you (generally). This is a new feature I added to 0.94 to support true time-range queries.

-- Lars


----- Original Message -----
From: yun peng <pe...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Sunday, October 21, 2012 1:53 PM
Subject: How to config hbase0.94.2 to retain deleted data

Hi, All,
I want to retain all deleted key-value pairs in hbase. I have tried to
config HColumnDescript as follow to make it return deleted.

  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
    hcd.setKeepDeletedCells(true);
    hcd.setBlockCacheEnabled(false);
  }

However, it does not work for me, as when I issued a delete and then query
by an older timestamp, the old data does not show up.

hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
=> 99, VERSIONS => 4}
COLUMN                CELL

0 row(s) in 0.0040 seconds

hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
=> 100, VERSIONS => 4}
COLUMN                CELL

0 row(s) in 0.0050 seconds

hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
=> 101, VERSIONS => 4}
COLUMN                CELL

cf:c1                timestamp=101, value=v2

1 row(s) in 0.0050 seconds

Note this is a new feature in 0.94.2
(HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
I did not find too many sample code online, so... any one here has
experience in using HBASE-4536. How should one config
hbase to enable this feature in hbase?

Thanks
Yun


Re: How to config hbase0.94.2 to retain deleted data

Posted by lars hofhansl <lh...@yahoo.com>.
Maybe this should be stated more clearly in the documentation.

If you need to perform time range queries (Scans/Gets as of time T) and you want those to be correct even when data was marked for delete you need this enabled.
If you do not care about the history of your data or you do not delete data you won't need it.

This has nothing to do with how the cells were marked for delete (by the entire column family, column, or version). Versioning is done per cell in HBase.



________________________________
 From: Michael Segel <mi...@hotmail.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Tuesday, October 23, 2012 11:40 AM
Subject: Re: How to config hbase0.94.2 to retain deleted data
 
Lars, 

No, that is not what I am suggesting. 

Perhaps I am missing something. Was the OP interested in cells or in row deletes.?

Two different issues. 

On Oct 23, 2012, at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:

> HBase has time range queries. You can say "give me the data as of time T" or "give me the data between X and Y". How far back you want to retain your data is specified via TTL and VERSIONS.
> 
> But... If you delete the data at T+X (X>0), a query as of time T won't return anything, even though at T the data was still there.
> 
> If you don't use TTL and/or VERSIONS in HBase you won't need this feature.
> 
> If you do use these you're doing so because you want get to the older data. And you delete stuff, chances are you want KEEP_DELETED_CELLS enabled.
> So within the boundaries specified by TTL/VERSIONS you can get to the data as of any time.
> 
> 
> By your logic nobody should use TTL/VERSIONS, which is nonsense.
> 
> 
> 
> ________________________________
> From: Michael Segel <mi...@hotmail.com>
> To: lars hofhansl <lh...@yahoo.com> 
> Cc: "user@hbase.apache.org" <us...@hbase.apache.org> 
> Sent: Tuesday, October 23, 2012 4:41 AM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> "Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete markers. "
> 
> This is different from the idea suggested by the OP. Here deleted cells still get deleted. Just that when the compaction flag comes along, its told to ignore them. 
> 
> So if I say a column can have 3 versions (cells) then if I insert another value for that row:column key, I push that deleted cell down the stack.  Enough times, its gone. 
> 
> In theory, this feature would be useful if I wanted an OLTP implementation on top of HBase. It would allow the transaction to bridge a compaction cycle. However, that's pretty much it. 
> 
> This feature doesn't translate well beyond this. 
> 
> It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and isolation levels? 
> 
> If you look at this at the row level... definitely not a good idea. Think of fat clogging an artery.
>  
> On Oct 23, 2012, at 12:22 AM, lars hofhansl <lh...@yahoo.com> wrote:
> 
>> http://hbase.apache.org/book/cf.keep.deleted.html
>> 
>> Without it you cannot do correct as-of-time queries when it comes to deletes.
>> 
>> -- Lars
>> 
>> From: Michael Segel <mi...@hotmail.com>
>> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
>> Sent: Monday, October 22, 2012 9:18 PM
>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>> 
>> Ok... so what exactly does this feature mean? 
>> 
>> Suppose I have 500 rows within a region. I set this feature to be true. 
>> I do a massive delete and there are only 50 rows left standing. 
>> 
>> So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full. 
>> 
>> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve. 
>> 
>> KISS would suggest that moving deleted data in to a different table would yield better performance in the long run. 
>> 
>> 
>> On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:
>> 
>>> That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
>>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>>> 
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Michael Segel <mi...@hotmail.com>
>>> To: user@hbase.apache.org
>>> Cc: 
>>> Sent: Sunday, October 21, 2012 4:34 PM
>>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>>> 
>>> I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
>>> Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
>>> Call it a history table. 
>>> 
>>> 
>>> On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
>>> 
>>>> Hi, All,
>>>> I want to retain all deleted key-value pairs in hbase. I have tried to
>>>> config HColumnDescript as follow to make it return deleted.
>>>> 
>>>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>>>     hcd.setKeepDeletedCells(true);
>>>>     hcd.setBlockCacheEnabled(false);
>>>>   }
>>>> 
>>>> However, it does not work for me, as when I issued a delete and then query
>>>> by an older timestamp, the old data does not show up.
>>>> 
>>>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>>>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>>>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>>>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 99, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0040 seconds
>>>> 
>>>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 100, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0050 seconds
>>>> 
>>>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 101, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> cf:c1                timestamp=101, value=v2
>>>> 
>>>> 1 row(s) in 0.0050 seconds
>>>> 
>>>> Note this is a new feature in 0.94.2
>>>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>>>> I did not find too many sample code online, so... any one here has
>>>> experience in using HBASE-4536. How should one config
>>>> hbase to enable this feature in hbase?
>>>> 
>>>> Thanks
>>>> Yun
>>> 
>> 
>> 

Re: How to config hbase0.94.2 to retain deleted data

Posted by Michael Segel <mi...@hotmail.com>.
Lars, 

No, that is not what I am suggesting. 

Perhaps I am missing something. Was the OP interested in cells or in row deletes.?

Two different issues. 

On Oct 23, 2012, at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:

> HBase has time range queries. You can say "give me the data as of time T" or "give me the data between X and Y". How far back you want to retain your data is specified via TTL and VERSIONS.
> 
> But... If you delete the data at T+X (X>0), a query as of time T won't return anything, even though at T the data was still there.
> 
> If you don't use TTL and/or VERSIONS in HBase you won't need this feature.
> 
> If you do use these you're doing so because you want get to the older data. And you delete stuff, chances are you want KEEP_DELETED_CELLS enabled.
> So within the boundaries specified by TTL/VERSIONS you can get to the data as of any time.
> 
> 
> By your logic nobody should use TTL/VERSIONS, which is nonsense.
> 
> 
> 
> ________________________________
> From: Michael Segel <mi...@hotmail.com>
> To: lars hofhansl <lh...@yahoo.com> 
> Cc: "user@hbase.apache.org" <us...@hbase.apache.org> 
> Sent: Tuesday, October 23, 2012 4:41 AM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> "Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete markers. "
> 
> This is different from the idea suggested by the OP. Here deleted cells still get deleted. Just that when the compaction flag comes along, its told to ignore them. 
> 
> So if I say a column can have 3 versions (cells) then if I insert another value for that row:column key, I push that deleted cell down the stack.  Enough times, its gone. 
> 
> In theory, this feature would be useful if I wanted an OLTP implementation on top of HBase. It would allow the transaction to bridge a compaction cycle. However, that's pretty much it. 
> 
> This feature doesn't translate well beyond this. 
> 
> It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and isolation levels? 
> 
> If you look at this at the row level... definitely not a good idea. Think of fat clogging an artery.
>   
> On Oct 23, 2012, at 12:22 AM, lars hofhansl <lh...@yahoo.com> wrote:
> 
>> http://hbase.apache.org/book/cf.keep.deleted.html
>> 
>> Without it you cannot do correct as-of-time queries when it comes to deletes.
>> 
>> -- Lars
>> 
>> From: Michael Segel <mi...@hotmail.com>
>> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
>> Sent: Monday, October 22, 2012 9:18 PM
>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>> 
>> Ok... so what exactly does this feature mean? 
>> 
>> Suppose I have 500 rows within a region. I set this feature to be true. 
>> I do a massive delete and there are only 50 rows left standing. 
>> 
>> So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full. 
>> 
>> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve. 
>> 
>> KISS would suggest that moving deleted data in to a different table would yield better performance in the long run. 
>> 
>> 
>> On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:
>> 
>>> That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
>>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>>> 
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Michael Segel <mi...@hotmail.com>
>>> To: user@hbase.apache.org
>>> Cc: 
>>> Sent: Sunday, October 21, 2012 4:34 PM
>>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>>> 
>>> I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
>>> Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
>>> Call it a history table. 
>>> 
>>> 
>>> On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
>>> 
>>>> Hi, All,
>>>> I want to retain all deleted key-value pairs in hbase. I have tried to
>>>> config HColumnDescript as follow to make it return deleted.
>>>> 
>>>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>>>     hcd.setKeepDeletedCells(true);
>>>>     hcd.setBlockCacheEnabled(false);
>>>>   }
>>>> 
>>>> However, it does not work for me, as when I issued a delete and then query
>>>> by an older timestamp, the old data does not show up.
>>>> 
>>>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>>>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>>>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>>>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 99, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0040 seconds
>>>> 
>>>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 100, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0050 seconds
>>>> 
>>>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 101, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> cf:c1                timestamp=101, value=v2
>>>> 
>>>> 1 row(s) in 0.0050 seconds
>>>> 
>>>> Note this is a new feature in 0.94.2
>>>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>>>> I did not find too many sample code online, so... any one here has
>>>> experience in using HBASE-4536. How should one config
>>>> hbase to enable this feature in hbase?
>>>> 
>>>> Thanks
>>>> Yun
>>> 
>> 
>> 


Re: How to config hbase0.94.2 to retain deleted data

Posted by lars hofhansl <lh...@yahoo.com>.
HBase has time range queries. You can say "give me the data as of time T" or "give me the data between X and Y". How far back you want to retain your data is specified via TTL and VERSIONS.

But... If you delete the data at T+X (X>0), a query as of time T won't return anything, even though at T the data was still there.

If you don't use TTL and/or VERSIONS in HBase you won't need this feature.

If you do use these you're doing so because you want get to the older data. And you delete stuff, chances are you want KEEP_DELETED_CELLS enabled.
So within the boundaries specified by TTL/VERSIONS you can get to the data as of any time.


By your logic nobody should use TTL/VERSIONS, which is nonsense.



________________________________
 From: Michael Segel <mi...@hotmail.com>
To: lars hofhansl <lh...@yahoo.com> 
Cc: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Tuesday, October 23, 2012 4:41 AM
Subject: Re: How to config hbase0.94.2 to retain deleted data
 
"Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete markers. "

This is different from the idea suggested by the OP. Here deleted cells still get deleted. Just that when the compaction flag comes along, its told to ignore them. 

So if I say a column can have 3 versions (cells) then if I insert another value for that row:column key, I push that deleted cell down the stack.  Enough times, its gone. 

In theory, this feature would be useful if I wanted an OLTP implementation on top of HBase. It would allow the transaction to bridge a compaction cycle. However, that's pretty much it. 

This feature doesn't translate well beyond this. 

It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and isolation levels? 

If you look at this at the row level... definitely not a good idea. Think of fat clogging an artery.
  
On Oct 23, 2012, at 12:22 AM, lars hofhansl <lh...@yahoo.com> wrote:

> http://hbase.apache.org/book/cf.keep.deleted.html
> 
> Without it you cannot do correct as-of-time queries when it comes to deletes.
> 
> -- Lars
> 
> From: Michael Segel <mi...@hotmail.com>
> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
> Sent: Monday, October 22, 2012 9:18 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> > 
> > Curious, why do you think this is better than using the keep-deleted-cells feature?
> > (It might well be, just curious)
> 
> Ok... so what exactly does this feature mean? 
> 
> Suppose I have 500 rows within a region. I set this feature to be true. 
> I do a massive delete and there are only 50 rows left standing. 
> 
> So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full. 
> 
> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve. 
> 
> KISS would suggest that moving deleted data in to a different table would yield better performance in the long run. 
> 
> 
> On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:
> 
> > That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
> > 
> > 
> > Curious, why do you think this is better than using the keep-deleted-cells feature?
> > (It might well be, just curious)
> > 
> > 
> > -- Lars
> > 
> > 
> > 
> > ----- Original Message -----
> > From: Michael Segel <mi...@hotmail.com>
> > To: user@hbase.apache.org
> > Cc: 
> > Sent: Sunday, October 21, 2012 4:34 PM
> > Subject: Re: How to config hbase0.94.2 to retain deleted data
> > 
> > I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
> > Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
> > Call it a history table. 
> > 
> > 
> > On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
> > 
> >> Hi, All,
> >> I want to retain all deleted key-value pairs in hbase. I have tried to
> >> config HColumnDescript as follow to make it return deleted.
> >> 
> >>  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
> >>    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
> >>    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
> >>    hcd.setKeepDeletedCells(true);
> >>    hcd.setBlockCacheEnabled(false);
> >>  }
> >> 
> >> However, it does not work for me, as when I issued a delete and then query
> >> by an older timestamp, the old data does not show up.
> >> 
> >> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> >> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> >> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> >> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 99, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> 0 row(s) in 0.0040 seconds
> >> 
> >> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 100, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> 0 row(s) in 0.0050 seconds
> >> 
> >> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 101, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> cf:c1                timestamp=101, value=v2
> >> 
> >> 1 row(s) in 0.0050 seconds
> >> 
> >> Note this is a new feature in 0.94.2
> >> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> >> I did not find too many sample code online, so... any one here has
> >> experience in using HBASE-4536. How should one config
> >> hbase to enable this feature in hbase?
> >> 
> >> Thanks
> >> Yun
> > 
> 
> 
> 

Re: How to config hbase0.94.2 to retain deleted data

Posted by Michael Segel <mi...@hotmail.com>.
"Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete markers. "

This is different from the idea suggested by the OP. Here deleted cells still get deleted. Just that when the compaction flag comes along, its told to ignore them. 

So if I say a column can have 3 versions (cells) then if I insert another value for that row:column key, I push that deleted cell down the stack.  Enough times, its gone. 

In theory, this feature would be useful if I wanted an OLTP implementation on top of HBase. It would allow the transaction to bridge a compaction cycle. However, that's pretty much it. 

This feature doesn't translate well beyond this. 

It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and isolation levels? 

If you look at this at the row level... definitely not a good idea. Think of fat clogging an artery.
   
On Oct 23, 2012, at 12:22 AM, lars hofhansl <lh...@yahoo.com> wrote:

> http://hbase.apache.org/book/cf.keep.deleted.html
> 
> Without it you cannot do correct as-of-time queries when it comes to deletes.
> 
> -- Lars
> 
> From: Michael Segel <mi...@hotmail.com>
> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
> Sent: Monday, October 22, 2012 9:18 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> > 
> > Curious, why do you think this is better than using the keep-deleted-cells feature?
> > (It might well be, just curious)
> 
> Ok... so what exactly does this feature mean? 
> 
> Suppose I have 500 rows within a region. I set this feature to be true. 
> I do a massive delete and there are only 50 rows left standing. 
> 
> So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full. 
> 
> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve. 
> 
> KISS would suggest that moving deleted data in to a different table would yield better performance in the long run. 
> 
> 
> On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:
> 
> > That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
> > 
> > 
> > Curious, why do you think this is better than using the keep-deleted-cells feature?
> > (It might well be, just curious)
> > 
> > 
> > -- Lars
> > 
> > 
> > 
> > ----- Original Message -----
> > From: Michael Segel <mi...@hotmail.com>
> > To: user@hbase.apache.org
> > Cc: 
> > Sent: Sunday, October 21, 2012 4:34 PM
> > Subject: Re: How to config hbase0.94.2 to retain deleted data
> > 
> > I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
> > Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
> > Call it a history table. 
> > 
> > 
> > On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
> > 
> >> Hi, All,
> >> I want to retain all deleted key-value pairs in hbase. I have tried to
> >> config HColumnDescript as follow to make it return deleted.
> >> 
> >>  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
> >>    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
> >>    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
> >>    hcd.setKeepDeletedCells(true);
> >>    hcd.setBlockCacheEnabled(false);
> >>  }
> >> 
> >> However, it does not work for me, as when I issued a delete and then query
> >> by an older timestamp, the old data does not show up.
> >> 
> >> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> >> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> >> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> >> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 99, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> 0 row(s) in 0.0040 seconds
> >> 
> >> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 100, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> 0 row(s) in 0.0050 seconds
> >> 
> >> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 101, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> cf:c1                timestamp=101, value=v2
> >> 
> >> 1 row(s) in 0.0050 seconds
> >> 
> >> Note this is a new feature in 0.94.2
> >> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> >> I did not find too many sample code online, so... any one here has
> >> experience in using HBASE-4536. How should one config
> >> hbase to enable this feature in hbase?
> >> 
> >> Thanks
> >> Yun
> > 
> 
> 
> 


Re: How to config hbase0.94.2 to retain deleted data

Posted by lars hofhansl <lh...@yahoo.com>.
http://hbase.apache.org/book/cf.keep.deleted.html

Without it you cannot do correct as-of-time queries when it comes to deletes.


-- Lars



________________________________
 From: Michael Segel <mi...@hotmail.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Monday, October 22, 2012 9:18 PM
Subject: Re: How to config hbase0.94.2 to retain deleted data
 
> 
> Curious, why do you think this is better than using the keep-deleted-cells feature?
> (It might well be, just curious)

Ok... so what exactly does this feature mean? 

Suppose I have 500 rows within a region. I set this feature to be true. 
I do a massive delete and there are only 50 rows left standing. 

So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full. 

Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve. 

KISS would suggest that moving deleted data in to a different table would yield better performance in the long run. 


On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:

> That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
> 
> 
> Curious, why do you think this is better than using the keep-deleted-cells feature?
> (It might well be, just curious)
> 
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: Michael Segel <mi...@hotmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Sunday, October 21, 2012 4:34 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
> Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
> Call it a history table. 
> 
> 
> On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
> 
>> Hi, All,
>> I want to retain all deleted key-value pairs in hbase. I have tried to
>> config HColumnDescript as follow to make it return deleted.
>> 
>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>     hcd.setKeepDeletedCells(true);
>>     hcd.setBlockCacheEnabled(false);
>>   }
>> 
>> However, it does not work for me, as when I issued a delete and then query
>> by an older timestamp, the old data does not show up.
>> 
>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 99, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0040 seconds
>> 
>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 100, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0050 seconds
>> 
>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 101, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> cf:c1                timestamp=101, value=v2
>> 
>> 1 row(s) in 0.0050 seconds
>> 
>> Note this is a new feature in 0.94.2
>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>> I did not find too many sample code online, so... any one here has
>> experience in using HBASE-4536. How should one config
>> hbase to enable this feature in hbase?
>> 
>> Thanks
>> Yun
> 

Re: How to config hbase0.94.2 to retain deleted data

Posted by Michael Segel <mi...@hotmail.com>.
> 
> Curious, why do you think this is better than using the keep-deleted-cells feature?
> (It might well be, just curious)

Ok... so what exactly does this feature mean? 

Suppose I have 500 rows within a region. I set this feature to be true. 
I do a massive delete and there are only 50 rows left standing. 

So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full. 

Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve. 

KISS would suggest that moving deleted data in to a different table would yield better performance in the long run. 


On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:

> That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
> 
> 
> Curious, why do you think this is better than using the keep-deleted-cells feature?
> (It might well be, just curious)
> 
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: Michael Segel <mi...@hotmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Sunday, October 21, 2012 4:34 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
> Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
> Call it a history table. 
> 
> 
> On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
> 
>> Hi, All,
>> I want to retain all deleted key-value pairs in hbase. I have tried to
>> config HColumnDescript as follow to make it return deleted.
>> 
>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>     hcd.setKeepDeletedCells(true);
>>     hcd.setBlockCacheEnabled(false);
>>   }
>> 
>> However, it does not work for me, as when I issued a delete and then query
>> by an older timestamp, the old data does not show up.
>> 
>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 99, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0040 seconds
>> 
>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 100, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0050 seconds
>> 
>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 101, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> cf:c1                timestamp=101, value=v2
>> 
>> 1 row(s) in 0.0050 seconds
>> 
>> Note this is a new feature in 0.94.2
>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>> I did not find too many sample code online, so... any one here has
>> experience in using HBASE-4536. How should one config
>> hbase to enable this feature in hbase?
>> 
>> Thanks
>> Yun
> 


Re: How to config hbase0.94.2 to retain deleted data

Posted by Michael Segel <mi...@hotmail.com>.
Lars, 

Like the secondary indexes,  doing remote updates to other region servers isn't necessarily a bad thing. 

There are ways to mitigate some of the costs of the update to the second table. I mean the actual update doesn't have to be synchronous.

HTH

-Mike

On Oct 21, 2012, at 7:23 PM, lars hofhansl <lh...@yahoo.com> wrote:

> That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
> 
> 
> Curious, why do you think this is better than using the keep-deleted-cells feature?
> (It might well be, just curious)
> 
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: Michael Segel <mi...@hotmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Sunday, October 21, 2012 4:34 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
> Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
> Call it a history table. 
> 
> 
> On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:
> 
>> Hi, All,
>> I want to retain all deleted key-value pairs in hbase. I have tried to
>> config HColumnDescript as follow to make it return deleted.
>> 
>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>     hcd.setKeepDeletedCells(true);
>>     hcd.setBlockCacheEnabled(false);
>>   }
>> 
>> However, it does not work for me, as when I issued a delete and then query
>> by an older timestamp, the old data does not show up.
>> 
>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 99, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0040 seconds
>> 
>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 100, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> 0 row(s) in 0.0050 seconds
>> 
>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>> => 101, VERSIONS => 4}
>> COLUMN                CELL
>> 
>> cf:c1                timestamp=101, value=v2
>> 
>> 1 row(s) in 0.0050 seconds
>> 
>> Note this is a new feature in 0.94.2
>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>> I did not find too many sample code online, so... any one here has
>> experience in using HBASE-4536. How should one config
>> hbase to enable this feature in hbase?
>> 
>> Thanks
>> Yun
> 


Re: How to config hbase0.94.2 to retain deleted data

Posted by lars hofhansl <lh...@yahoo.com>.
That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)


Curious, why do you think this is better than using the keep-deleted-cells feature?
(It might well be, just curious)


-- Lars



----- Original Message -----
From: Michael Segel <mi...@hotmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Sunday, October 21, 2012 4:34 PM
Subject: Re: How to config hbase0.94.2 to retain deleted data

I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
Call it a history table. 


On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:

> Hi, All,
> I want to retain all deleted key-value pairs in hbase. I have tried to
> config HColumnDescript as follow to make it return deleted.
> 
>  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>    hcd.setKeepDeletedCells(true);
>    hcd.setBlockCacheEnabled(false);
>  }
> 
> However, it does not work for me, as when I issued a delete and then query
> by an older timestamp, the old data does not show up.
> 
> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 99, VERSIONS => 4}
> COLUMN                CELL
> 
> 0 row(s) in 0.0040 seconds
> 
> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 100, VERSIONS => 4}
> COLUMN                CELL
> 
> 0 row(s) in 0.0050 seconds
> 
> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 101, VERSIONS => 4}
> COLUMN                CELL
> 
> cf:c1                timestamp=101, value=v2
> 
> 1 row(s) in 0.0050 seconds
> 
> Note this is a new feature in 0.94.2
> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> I did not find too many sample code online, so... any one here has
> experience in using HBASE-4536. How should one config
> hbase to enable this feature in hbase?
> 
> Thanks
> Yun

Re: How to config hbase0.94.2 to retain deleted data

Posted by Marcos Ortiz Valmaseda <ml...@uci.cu>.
+1 for this solution.
A history table can solve this with less troubles

----- Mensaje original -----
De: Michael Segel <mi...@hotmail.com>
Para: user@hbase.apache.org
Enviado: Sun, 21 Oct 2012 19:34:04 -0400 (CDT)
Asunto: Re: How to config hbase0.94.2 to retain deleted data

I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
Call it a history table. 


On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:

> Hi, All,
> I want to retain all deleted key-value pairs in hbase. I have tried to
> config HColumnDescript as follow to make it return deleted.
> 
>  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>    hcd.setKeepDeletedCells(true);
>    hcd.setBlockCacheEnabled(false);
>  }
> 
> However, it does not work for me, as when I issued a delete and then query
> by an older timestamp, the old data does not show up.
> 
> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 99, VERSIONS => 4}
> COLUMN                CELL
> 
> 0 row(s) in 0.0040 seconds
> 
> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 100, VERSIONS => 4}
> COLUMN                CELL
> 
> 0 row(s) in 0.0050 seconds
> 
> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 101, VERSIONS => 4}
> COLUMN                CELL
> 
> cf:c1                timestamp=101, value=v2
> 
> 1 row(s) in 0.0050 seconds
> 
> Note this is a new feature in 0.94.2
> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> I did not find too many sample code online, so... any one here has
> experience in using HBASE-4536. How should one config
> hbase to enable this feature in hbase?
> 
> Thanks
> Yun


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: How to config hbase0.94.2 to retain deleted data

Posted by Michael Segel <mi...@hotmail.com>.
I would suggest that you use your coprocessor to copy the data to a 'backup' table when you mark them for delete. 
Then as major compaction hits, the rows are deleted from the main table, but still reside undeleted in your delete table. 
Call it a history table. 


On Oct 21, 2012, at 3:53 PM, yun peng <pe...@gmail.com> wrote:

> Hi, All,
> I want to retain all deleted key-value pairs in hbase. I have tried to
> config HColumnDescript as follow to make it return deleted.
> 
>  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>    hcd.setKeepDeletedCells(true);
>    hcd.setBlockCacheEnabled(false);
>  }
> 
> However, it does not work for me, as when I issued a delete and then query
> by an older timestamp, the old data does not show up.
> 
> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 99, VERSIONS => 4}
> COLUMN                CELL
> 
> 0 row(s) in 0.0040 seconds
> 
> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 100, VERSIONS => 4}
> COLUMN                CELL
> 
> 0 row(s) in 0.0050 seconds
> 
> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> => 101, VERSIONS => 4}
> COLUMN                CELL
> 
> cf:c1                timestamp=101, value=v2
> 
> 1 row(s) in 0.0050 seconds
> 
> Note this is a new feature in 0.94.2
> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> I did not find too many sample code online, so... any one here has
> experience in using HBASE-4536. How should one config
> hbase to enable this feature in hbase?
> 
> Thanks
> Yun