You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ian Varley <iv...@salesforce.com> on 2011/08/14 03:51:49 UTC

TTL for cell values

Hi all,

Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3 versions" you say "keep versions more recent than time T"). But, if there's only 1 value in the cell, and that value is older than the TTL, will it also be deleted?

If so, has there ever been discussion of a "TTL except for most recent" option? (i.e. you want the current version to be permanently persistent, but also want some time-based range of version history, so you can peek back and get consistent snapshots within the last hour, 6 hours, 24 hours, etc). TTL seems perfect for this, but not if it'll chomp the current version of cells too! :)

Thanks!
Ian

Re: When are versions collected?

Posted by Stack <st...@duboce.net>.
On Wed, Aug 17, 2011 at 3:54 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Thanks J-D, that's exactly what I was looking.I guess I was expecting that a manual major compaction would imply a cache flush. Should it?
>

It does not.

On the other hand, we have yet to implement a bigtable optimization
where a flush is merged with a storefile; i.e. a new file is written
where the source is the in-memory flushed edits and a storefile that
was already in the filesystem (the original storefile is dropped on
flush completion)

St.Ack

Re: When are versions collected?

Posted by lars hofhansl <lh...@yahoo.com>.
Thanks J-D, that's exactly what I was looking.I guess I was expecting that a manual major compaction would imply a cache flush. Should it?


-- Lars



________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Monday, August 15, 2011 3:59 PM
Subject: Re: When are versions collected?

A major compaction only compacts data in the store files (eg on disk)
but not in the memstores (eg in memory). Force flush that table after
you do those puts and you should get what you expect.

J-D

On Sun, Aug 14, 2011 at 3:33 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi All,
> The following is a session in hbase shell.
>
> You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
>
> Does "major_compact" from the shell not trigger a major compaction?
>
> Or does it just not do anything because there's only one small file to begin with?
>
>
> (HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
>
>
> I apologize in advance if this is documented somewhere already.
>
>
> Thanks.
>
> -- Lars
>
>
> --------------
>
>
> hbase(main):003:0> describe 'test'
> DESCRIPTION                                          ENABLED
>  {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
>  ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
>  ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
>  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
>  CACHE => 'true'}]}
> 1 row(s) in 0.0270 seconds
>
> hbase(main):004:0> scan 'test'
> ROW                   COLUMN+CELL
> 0 row(s) in 0.1260 seconds
>
> hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
> 0 row(s) in 0.0130 seconds
>
> hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
> 0 row(s) in 0.0160 seconds
>
> hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
> 0 row(s) in 0.0120 seconds
>
> hbase(main):009:0> scan 'test'
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360551865, value=val4
> 1 row(s) in 0.0380 seconds
>
> hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360550493, value=val3
> 1 row(s) in 0.0310 seconds
>
> hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360549220, value=val2
> 1 row(s) in 0.0310 seconds
>
> hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0190 seconds
>
> hbase(main):013:0> major_compact 'test'
> 0 row(s) in 0.0670 seconds
>
> hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0230 seconds
>
> hbase(main):015:0> major_compact 'test'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0300 seconds

Re: When are versions collected?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
A major compaction only compacts data in the store files (eg on disk)
but not in the memstores (eg in memory). Force flush that table after
you do those puts and you should get what you expect.

J-D

On Sun, Aug 14, 2011 at 3:33 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi All,
> The following is a session in hbase shell.
>
> You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.
>
> Does "major_compact" from the shell not trigger a major compaction?
>
> Or does it just not do anything because there's only one small file to begin with?
>
>
> (HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).
>
>
> I apologize in advance if this is documented somewhere already.
>
>
> Thanks.
>
> -- Lars
>
>
> --------------
>
>
> hbase(main):003:0> describe 'test'
> DESCRIPTION                                          ENABLED
>  {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true
>  ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
>  ION => 'NONE', VERSIONS => '1', TTL => '2147483647'
>  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
>  CACHE => 'true'}]}
> 1 row(s) in 0.0270 seconds
>
> hbase(main):004:0> scan 'test'
> ROW                   COLUMN+CELL
> 0 row(s) in 0.1260 seconds
>
> hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2'
> 0 row(s) in 0.0130 seconds
>
> hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
> 0 row(s) in 0.0160 seconds
>
> hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
> 0 row(s) in 0.0120 seconds
>
> hbase(main):009:0> scan 'test'
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360551865, value=val4
> 1 row(s) in 0.0380 seconds
>
> hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360550493, value=val3
> 1 row(s) in 0.0310 seconds
>
> hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360549220, value=val2
> 1 row(s) in 0.0310 seconds
>
> hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0190 seconds
>
> hbase(main):013:0> major_compact 'test'
> 0 row(s) in 0.0670 seconds
>
> hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0230 seconds
>
> hbase(main):015:0> major_compact 'test'
> 0 row(s) in 0.0450 seconds
>
> hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
> ROW                   COLUMN+CELL
>  r1                   column=cf:x, timestamp=1313360545784, value=val1
> 1 row(s) in 0.0300 seconds

When are versions collected?

Posted by lars hofhansl <lh...@yahoo.com>.
Hi All,
The following is a session in hbase shell.

You can see that VERSIONS = 1, yet even after a triggering a major compaction I can still retrieve version 2, 3, and 4, when querying by timestamp, even after I triggered a major compaction.

Does "major_compact" from the shell not trigger a major compaction?

Or does it just not do anything because there's only one small file to begin with?


(HBase version is 0.90.3-cdh3u1 and it was running in local mode, in case that makes a difference).


I apologize in advance if this is documented somewhere already.


Thanks.

-- Lars


--------------


hbase(main):003:0> describe 'test'
DESCRIPTION                                          ENABLED                    
 {NAME => 'test', FAMILIES => [{NAME => 'cf', BLOOMF true                       
 ILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS                            
 ION => 'NONE', VERSIONS => '1', TTL => '2147483647'                            
 , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK                            
 CACHE => 'true'}]}                                                             
1 row(s) in 0.0270 seconds

hbase(main):004:0> scan 'test'
ROW                   COLUMN+CELL                                               
0 row(s) in 0.1260 seconds

hbase(main):005:0> put 'test', 'r1', 'cf:x', 'val1'
0 row(s) in 0.0450 seconds

hbase(main):006:0> put 'test', 'r1', 'cf:x', 'val2' 
0 row(s) in 0.0130 seconds

hbase(main):007:0> put 'test', 'r1', 'cf:x', 'val3'
0 row(s) in 0.0160 seconds

hbase(main):008:0> put 'test', 'r1', 'cf:x', 'val4'
0 row(s) in 0.0120 seconds

hbase(main):009:0> scan 'test'
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360551865, value=val4          
1 row(s) in 0.0380 seconds

hbase(main):010:0> scan 'test', {TIMERANGE => [0, 1313360551865]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360550493, value=val3          
1 row(s) in 0.0310 seconds

hbase(main):011:0> scan 'test', {TIMERANGE => [0, 1313360550493]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360549220, value=val2          
1 row(s) in 0.0310 seconds

hbase(main):012:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360545784, value=val1          
1 row(s) in 0.0190 seconds

hbase(main):013:0> major_compact 'test'
0 row(s) in 0.0670 seconds

hbase(main):014:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360545784, value=val1          
1 row(s) in 0.0230 seconds

hbase(main):015:0> major_compact 'test'                          
0 row(s) in 0.0450 seconds

hbase(main):016:0> scan 'test', {TIMERANGE => [0, 1313360549220]}
ROW                   COLUMN+CELL                                               
 r1                   column=cf:x, timestamp=1313360545784, value=val1          
1 row(s) in 0.0300 seconds

Re: TTL for cell values

Posted by Andrew Purtell <ap...@apache.org>.
> It's part of the mindset shift you have to go through coming from a database world to a NoSQL world

This is useful. If you have more insights like this Ian and care to share them, I think we would be really interested to hear them.

Best regards,


   - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Ian Varley <iv...@salesforce.com>
>To: "user@hbase.apache.org" <us...@hbase.apache.org>
>Sent: Sunday, August 14, 2011 8:36 AM
>Subject: Re: TTL for cell values
>
>"I am slightly confused now. Time to live is used in networking , after n
>hops drop this packet. Also used I'm memcache , expire this data n seconds
>after insert. I do not know of any specific ttl features in rdbms so I do not understand
>why someone would expect  ttl to he permanently durable."
>
>Edward, my mistake was originally assuming that the TTL applied only to *old* versions of a cell (I.e. not the most recent one). It was a misunderstanding on my part, based on the fact that there *are* no TTL features in an RDBMS (you only get rid of the current value by issuing an explicit delete).
>
>HBase is light years ahead of an RDBMS in the way it explicitly handles the time dimension; I just wasn't expecting this facet of that behavior. It's part of the mindset shift you have to go through coming from a database world to a NoSQL world; you can treat parts of your store as a transient cache if you want to (which is useful in all kinds of situations). Just needed to expand my brain to consider that ... :)
>
>Ian
>
>
>On Aug 14, 2011, at 7:30 AM, "Edward Capriolo" <ed...@gmail.com>> wrote:
>
>I am slightly confused now. Time to live is used in networking , after n
>hops drop this packet. Also used I'm memcache , expire this data n seconds
>after insert.
>
>I do not know of any specific ttl features in rdbms so I do not understand
>why someone would expect  ttl to he permanently durable.
>
>
>

Re: TTL for cell values

Posted by Ian Varley <iv...@salesforce.com>.
"I am slightly confused now. Time to live is used in networking , after n
hops drop this packet. Also used I'm memcache , expire this data n seconds
after insert. I do not know of any specific ttl features in rdbms so I do not understand
why someone would expect  ttl to he permanently durable."

Edward, my mistake was originally assuming that the TTL applied only to *old* versions of a cell (I.e. not the most recent one). It was a misunderstanding on my part, based on the fact that there *are* no TTL features in an RDBMS (you only get rid of the current value by issuing an explicit delete).

HBase is light years ahead of an RDBMS in the way it explicitly handles the time dimension; I just wasn't expecting this facet of that behavior. It's part of the mindset shift you have to go through coming from a database world to a NoSQL world; you can treat parts of your store as a transient cache if you want to (which is useful in all kinds of situations). Just needed to expand my brain to consider that ... :)

Ian


On Aug 14, 2011, at 7:30 AM, "Edward Capriolo" <ed...@gmail.com>> wrote:

I am slightly confused now. Time to live is used in networking , after n
hops drop this packet. Also used I'm memcache , expire this data n seconds
after insert.

I do not know of any specific ttl features in rdbms so I do not understand
why someone would expect  ttl to he permanently durable.

Re: TTL for cell values

Posted by Edward Capriolo <ed...@gmail.com>.
On Sunday, August 14, 2011, Ian Varley <iv...@salesforce.com> wrote:
>> "I don't think anyone is well served by that kind of shallow analysis."
>
>
> You're right, Andy; sorry if it came off sounding flip. My point was
simply that the idea of a persistent data store with a configuration setting
that makes the most current version of your data disappear without an
explicit delete is very counter-intuitve for traditional database folks like
me. Durability is the first, most inviolate rule, and this setting subverts
it in a way that is (at least for me) not obvious at first, and differs
drastically from the max versions setting. Maybe my confusion was due to the
fact that I was looking for specific behavior (HBASE-4071, essentially). I
totally see your point, though; putting it the way I did makes for a rather
alarming pull quote. :(
>
> I'm not at all suggesting we should alter the existing behavior (as if
that were even possible at this point); this is a useful setting for data
that's basically just a cache. But this is an area where the road from RDBMS
to HBase might be a little bumpy for folks, and adding a new option would
also have the advantage of making it even more clear what TTL is for.
>
> Ian
>
>
> On Aug 13, 2011, at 11:28 PM, "Andrew Purtell" <ap...@apache.org>
wrote:
>
>>>  When I was talking to someone the other day about the current TTL
policy, he was like "WTF, who would want that, it eats your data?"
>>
>> I don't think anyone is well served by that kind of shallow analysis.
>>
>> The TTL feature was introduced for the convenience of having the system
automatically garbage collect transient data. If you set a TTL on a column
family, you are telling the system that the data shall expire after that
interval elapses, that the data is only useful for the configured time
period. If the data should not actually be considered transient, then
configuring a TTL is the wrong thing to do -- at least currently.
>>
>>>  "TTL except for most recent"
>>
>> HBASE-4071 is a useful and good idea.
>>
>> Best regards,
>>
>>
>> - Andy
>>
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
>>
>>
>>> ________________________________
>>> From: Ian Varley <iv...@salesforce.com>
>>> To: "user@hbase.apache.org" <us...@hbase.apache.org>
>>> Sent: Saturday, August 13, 2011 8:24 PM
>>> Subject: Re: TTL for cell values
>>>
>>> So, what you're saying is:
>>>
>>> http://lmgtfy.com/?q=hbase+ttl+remove+all+versions+except+most+recent
>>>
>>> :)
>>>
>>> I like the idea of making this pluggable (via the coprocessor framework,
or otherwise). But I also think this is a fundamental enough policy option
that making it hard-coded might be a good idea. When I was talking to
someone the other day about the current TTL policy, he was like, "WTF, who
would want that, it eats your data?". There's no such thing as a "keep 0
versions" option, and thus no way to accidentally lose your most current
data using that approach. But with the TTL version there is, which is (IMO)
counter-intuitive for those coming from an RDBMS background.
>>>
>>> Commented thusly in the JIRA. :)
>>>
>>> Ian
>>>
>>> On Aug 13, 2011, at 8:00 PM, lars hofhansl wrote:
>>>
>>> Hey Ian, (how are things :)
>>>
>>> I just stumbled across https://issues.apache.org/jira/browse/HBASE-4071.
>>>
>>> -- Lars
>>>
>>>
>>> ________________________________
>>> From: Ian Varley <iv...@salesforce.com>>
>>> To: "user@hbase.apache.org<ma...@hbase.apache.org>" <
user@hbase.apache.org<ma...@hbase.apache.org>>
>>> Sent: Saturday, August 13, 2011 6:51 PM
>>> Subject: TTL for cell values
>>>
>>> Hi all,
>>>
>>> Quick clarification on TTL for cells. The concept makes sense (instead
of "keep 3 versions" you say "keep versions more recent than time T"). But,
if there's only 1 value in the cell, and that value is older than the TTL,
will it also be deleted?
>>>
>>> If so, has there ever been discussion of a "TTL except for most recent"
option? (i.e. you want the current version to be permanently persistent, but
also want some time-based range of version history, so you can peek back and
get consistent snapshots within the last hour, 6 hours, 24 hours, etc). TTL
seems perfect for this, but not if it'll chomp the current version of cells
too! :)
>>>
>>> Thanks!
>>> Ian
>>>
>>>
>>>
>>>
>

I am slightly confused now. Time to live is used in networking , after n
hops drop this packet. Also used I'm memcache , expire this data n seconds
after insert.

I do not know of any specific ttl features in rdbms so I do not understand
why someone would expect  ttl to he permanently durable.

Re: TTL for cell values

Posted by Ian Varley <iv...@salesforce.com>.
> "I don't think anyone is well served by that kind of shallow analysis."


You're right, Andy; sorry if it came off sounding flip. My point was simply that the idea of a persistent data store with a configuration setting that makes the most current version of your data disappear without an explicit delete is very counter-intuitve for traditional database folks like me. Durability is the first, most inviolate rule, and this setting subverts it in a way that is (at least for me) not obvious at first, and differs drastically from the max versions setting. Maybe my confusion was due to the fact that I was looking for specific behavior (HBASE-4071, essentially). I totally see your point, though; putting it the way I did makes for a rather alarming pull quote. :(

I'm not at all suggesting we should alter the existing behavior (as if that were even possible at this point); this is a useful setting for data that's basically just a cache. But this is an area where the road from RDBMS to HBase might be a little bumpy for folks, and adding a new option would also have the advantage of making it even more clear what TTL is for. 

Ian


On Aug 13, 2011, at 11:28 PM, "Andrew Purtell" <ap...@apache.org> wrote:

>>  When I was talking to someone the other day about the current TTL policy, he was like "WTF, who would want that, it eats your data?"
>  
> I don't think anyone is well served by that kind of shallow analysis. 
> 
> The TTL feature was introduced for the convenience of having the system automatically garbage collect transient data. If you set a TTL on a column family, you are telling the system that the data shall expire after that interval elapses, that the data is only useful for the configured time period. If the data should not actually be considered transient, then configuring a TTL is the wrong thing to do -- at least currently.
> 
>>  "TTL except for most recent"
> 
> HBASE-4071 is a useful and good idea.
> 
> Best regards,
> 
> 
> - Andy
> 
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
> 
> 
>> ________________________________
>> From: Ian Varley <iv...@salesforce.com>
>> To: "user@hbase.apache.org" <us...@hbase.apache.org>
>> Sent: Saturday, August 13, 2011 8:24 PM
>> Subject: Re: TTL for cell values
>> 
>> So, what you're saying is:
>> 
>> http://lmgtfy.com/?q=hbase+ttl+remove+all+versions+except+most+recent
>> 
>> :)
>> 
>> I like the idea of making this pluggable (via the coprocessor framework, or otherwise). But I also think this is a fundamental enough policy option that making it hard-coded might be a good idea. When I was talking to someone the other day about the current TTL policy, he was like, "WTF, who would want that, it eats your data?". There's no such thing as a "keep 0 versions" option, and thus no way to accidentally lose your most current data using that approach. But with the TTL version there is, which is (IMO) counter-intuitive for those coming from an RDBMS background.
>> 
>> Commented thusly in the JIRA. :)
>> 
>> Ian
>> 
>> On Aug 13, 2011, at 8:00 PM, lars hofhansl wrote:
>> 
>> Hey Ian, (how are things :)
>> 
>> I just stumbled across https://issues.apache.org/jira/browse/HBASE-4071.
>> 
>> -- Lars
>> 
>> 
>> ________________________________
>> From: Ian Varley <iv...@salesforce.com>>
>> To: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
>> Sent: Saturday, August 13, 2011 6:51 PM
>> Subject: TTL for cell values
>> 
>> Hi all,
>> 
>> Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3 versions" you say "keep versions more recent than time T"). But, if there's only 1 value in the cell, and that value is older than the TTL, will it also be deleted?
>> 
>> If so, has there ever been discussion of a "TTL except for most recent" option? (i.e. you want the current version to be permanently persistent, but also want some time-based range of version history, so you can peek back and get consistent snapshots within the last hour, 6 hours, 24 hours, etc). TTL seems perfect for this, but not if it'll chomp the current version of cells too! :)
>> 
>> Thanks!
>> Ian
>> 
>> 
>> 
>> 

Re: TTL for cell values

Posted by Andrew Purtell <ap...@apache.org>.
> When I was talking to someone the other day about the current TTL policy, he was like "WTF, who would want that, it eats your data?"
 
I don't think anyone is well served by that kind of shallow analysis. 

The TTL feature was introduced for the convenience of having the system automatically garbage collect transient data. If you set a TTL on a column family, you are telling the system that the data shall expire after that interval elapses, that the data is only useful for the configured time period. If the data should not actually be considered transient, then configuring a TTL is the wrong thing to do -- at least currently.

> "TTL except for most recent"

HBASE-4071 is a useful and good idea.

Best regards,


- Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Ian Varley <iv...@salesforce.com>
>To: "user@hbase.apache.org" <us...@hbase.apache.org>
>Sent: Saturday, August 13, 2011 8:24 PM
>Subject: Re: TTL for cell values
>
>So, what you're saying is:
>
>http://lmgtfy.com/?q=hbase+ttl+remove+all+versions+except+most+recent
>
>:)
>
>I like the idea of making this pluggable (via the coprocessor framework, or otherwise). But I also think this is a fundamental enough policy option that making it hard-coded might be a good idea. When I was talking to someone the other day about the current TTL policy, he was like, "WTF, who would want that, it eats your data?". There's no such thing as a "keep 0 versions" option, and thus no way to accidentally lose your most current data using that approach. But with the TTL version there is, which is (IMO) counter-intuitive for those coming from an RDBMS background.
>
>Commented thusly in the JIRA. :)
>
>Ian
>
>On Aug 13, 2011, at 8:00 PM, lars hofhansl wrote:
>
>Hey Ian, (how are things :)
>
>I just stumbled across https://issues.apache.org/jira/browse/HBASE-4071.
>
>-- Lars
>
>
>________________________________
>From: Ian Varley <iv...@salesforce.com>>
>To: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
>Sent: Saturday, August 13, 2011 6:51 PM
>Subject: TTL for cell values
>
>Hi all,
>
>Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3 versions" you say "keep versions more recent than time T"). But, if there's only 1 value in the cell, and that value is older than the TTL, will it also be deleted?
>
>If so, has there ever been discussion of a "TTL except for most recent" option? (i.e. you want the current version to be permanently persistent, but also want some time-based range of version history, so you can peek back and get consistent snapshots within the last hour, 6 hours, 24 hours, etc). TTL seems perfect for this, but not if it'll chomp the current version of cells too! :)
>
>Thanks!
>Ian
>
>
>
>

Re: TTL for cell values

Posted by Ian Varley <iv...@salesforce.com>.
So, what you're saying is:

http://lmgtfy.com/?q=hbase+ttl+remove+all+versions+except+most+recent

:)

I like the idea of making this pluggable (via the coprocessor framework, or otherwise). But I also think this is a fundamental enough policy option that making it hard-coded might be a good idea. When I was talking to someone the other day about the current TTL policy, he was like, "WTF, who would want that, it eats your data?". There's no such thing as a "keep 0 versions" option, and thus no way to accidentally lose your most current data using that approach. But with the TTL version there is, which is (IMO) counter-intuitive for those coming from an RDBMS background.

Commented thusly in the JIRA. :)

Ian

On Aug 13, 2011, at 8:00 PM, lars hofhansl wrote:

Hey Ian, (how are things :)

I just stumbled across https://issues.apache.org/jira/browse/HBASE-4071.

-- Lars


________________________________
From: Ian Varley <iv...@salesforce.com>>
To: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Sent: Saturday, August 13, 2011 6:51 PM
Subject: TTL for cell values

Hi all,

Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3 versions" you say "keep versions more recent than time T"). But, if there's only 1 value in the cell, and that value is older than the TTL, will it also be deleted?

If so, has there ever been discussion of a "TTL except for most recent" option? (i.e. you want the current version to be permanently persistent, but also want some time-based range of version history, so you can peek back and get consistent snapshots within the last hour, 6 hours, 24 hours, etc). TTL seems perfect for this, but not if it'll chomp the current version of cells too! :)

Thanks!
Ian


Re: TTL for cell values

Posted by lars hofhansl <lh...@yahoo.com>.
Hey Ian, (how are things :)

I just stumbled across https://issues.apache.org/jira/browse/HBASE-4071.

-- Lars


________________________________
From: Ian Varley <iv...@salesforce.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Saturday, August 13, 2011 6:51 PM
Subject: TTL for cell values

Hi all,

Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3 versions" you say "keep versions more recent than time T"). But, if there's only 1 value in the cell, and that value is older than the TTL, will it also be deleted?

If so, has there ever been discussion of a "TTL except for most recent" option? (i.e. you want the current version to be permanently persistent, but also want some time-based range of version history, so you can peek back and get consistent snapshots within the last hour, 6 hours, 24 hours, etc). TTL seems perfect for this, but not if it'll chomp the current version of cells too! :)

Thanks!
Ian