You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jon Schutz <Jo...@youramigo.com> on 2009/06/30 04:21:58 UTC

TTL, Versions and storing long history

How do TTL and Versions specifications interact?  I'm guessing that the
first limit reached applies, i.e. if TTL is 1 week and versions is 3,
adding a fourth update to a data record would cause the first to be
bumped even if it is less than a week old?  And if I only have 2
versions but one is 2 weeks old, the expired one gets bumped even though
the versions limit has not been reached?

Is there a way to say "Keep versions < x weeks old, but always keep at
least the latest version, no matter how old?"

Suppose I want to keep the history about a particular object forever.
Looks like TTL can be set to 'Forever' (-1) but Versions has no
'infinite' setting - I guess that's OK as in practice MAXINT is "big
enough".  Would it be wise to use Hbase like this to maintain a history,
or should I be adding a time component into the key and storing multiple
records?  Can anyone help outline the pros and cons?

Thanks,


-- 
Jon Schutz                        My tech notes http://notes.jschutz.net
Chief Technology Officer                        http://www.youramigo.com
YourAmigo         


Re: TTL, Versions and storing long history

Posted by Jonathan Gray <jl...@streamy.com>.
Yes, this is a reference to HEAD of trunk.

It is sufficiently stable for development and assessment.  Not currently
recommended for production until the release.

We are hoping to have an RC out next week.  Some of the larger users will
be putting that into production almost immediately, baring any major
issues.

JG

On Mon, June 29, 2009 11:05 pm, Jon Schutz wrote:
> Thanks Jonathan, that advice is helpful.
>
>
> I've seen 0.20 mentioned a few times on the list - is this a reference
> to current SVN HEAD, and if so is it considered sufficiently stable to be
> deployable?
>
> --
>
>
> Jon Schutz 			My tech notes http://notes.jschutz.net
> Chief Technology Officer 	http://www.youramigo.com
> YourAmigo
>
>
>
>
>
> Jonathan Gray wrote:
>
>> Jon,
>>
>>
>> Prior to 0.20, I would definitely recommend moving the time component
>> to the keys, columns, and values.  Even after 0.20, I recommend doing
>> that if you want complete control.  My personal philosophy is that
>> versions are for versioning, and if you are really using them as a time
>> dimension of individual data points, you should consider not using
>> versions.
>>
>> However, the API and server-side implementation for versions is greatly
>>  improved.  You can specify stamps manually and you can query for any
>> range you want, gets and scans.
>>
>> There is not currently a way to keep versions < x weeks old but always
>> keep the latest version.  If you wanted to enforce something like that,
>> you could always write a MapReduce job that ran periodically and
>> enforced what you wanted.
>>
>> If you want to keep history forever, the idea is to use the "big
>> enough" values.  In practice, only since HBase 0.20 have we been able to
>> handle millions of versions of a single column (Integer.MAX_VALUE is >2
>> billion, far beyond the capabilities of HBase).  The same goes for
>> TTL... 2 billion seconds is over 60 years.  Could also move everything
>> to Long which would ensure there would never be an issue.  Will dig more
>>  and let you know.
>>
>> In any case, you'll need 0.20 to fully take advantage of versions.
>>
>>
>> Hope that helps.
>>
>>
>> JG
>>
>>
>> Jon Schutz wrote:
>>
>>> How do TTL and Versions specifications interact?  I'm guessing that
>>> the first limit reached applies, i.e. if TTL is 1 week and versions is
>>> 3,
>>> adding a fourth update to a data record would cause the first to be
>>> bumped even if it is less than a week old?  And if I only have 2
>>> versions but one is 2 weeks old, the expired one gets bumped even
>>> though the versions limit has not been reached?
>>>
>>> Is there a way to say "Keep versions < x weeks old, but always keep
>>> at least the latest version, no matter how old?"
>>>
>>> Suppose I want to keep the history about a particular object forever.
>>>  Looks like TTL can be set to 'Forever' (-1) but Versions has no
>>> 'infinite' setting - I guess that's OK as in practice MAXINT is "big
>>> enough".  Would it be wise to use Hbase like this to maintain a
>>> history, or should I be adding a time component into the key and
>>> storing multiple records?  Can anyone help outline the pros and cons?
>>>
>>> Thanks,
>>>
>>>
>>>
>>
>
>


Re: TTL, Versions and storing long history

Posted by Jon Schutz <jo...@youramigo.com>.
Thanks Jonathan, that advice is helpful.

I've seen 0.20 mentioned a few times on the list - is this a reference
to current SVN HEAD, and if so is it considered sufficiently stable to
be deployable?

-- 

Jon Schutz 			My tech notes http://notes.jschutz.net
Chief Technology Officer 	http://www.youramigo.com
YourAmigo




Jonathan Gray wrote:
> Jon,
> 
> Prior to 0.20, I would definitely recommend moving the time component to
> the keys, columns, and values.  Even after 0.20, I recommend doing that
> if you want complete control.  My personal philosophy is that versions
> are for versioning, and if you are really using them as a time dimension
> of individual data points, you should consider not using versions.
> 
> However, the API and server-side implementation for versions is greatly
> improved.  You can specify stamps manually and you can query for any
> range you want, gets and scans.
> 
> There is not currently a way to keep versions < x weeks old but always
> keep the latest version.  If you wanted to enforce something like that,
> you could always write a MapReduce job that ran periodically and
> enforced what you wanted.
> 
> If you want to keep history forever, the idea is to use the "big enough"
> values.  In practice, only since HBase 0.20 have we been able to handle
> millions of versions of a single column (Integer.MAX_VALUE is >2
> billion, far beyond the capabilities of HBase).  The same goes for
> TTL... 2 billion seconds is over 60 years.  Could also move everything
> to Long which would ensure there would never be an issue.  Will dig more
> and let you know.
> 
> In any case, you'll need 0.20 to fully take advantage of versions.
> 
> Hope that helps.
> 
> JG
> 
> Jon Schutz wrote:
>> How do TTL and Versions specifications interact?  I'm guessing that the
>> first limit reached applies, i.e. if TTL is 1 week and versions is 3,
>> adding a fourth update to a data record would cause the first to be
>> bumped even if it is less than a week old?  And if I only have 2
>> versions but one is 2 weeks old, the expired one gets bumped even though
>> the versions limit has not been reached?
>>
>> Is there a way to say "Keep versions < x weeks old, but always keep at
>> least the latest version, no matter how old?"
>>
>> Suppose I want to keep the history about a particular object forever.
>> Looks like TTL can be set to 'Forever' (-1) but Versions has no
>> 'infinite' setting - I guess that's OK as in practice MAXINT is "big
>> enough".  Would it be wise to use Hbase like this to maintain a history,
>> or should I be adding a time component into the key and storing multiple
>> records?  Can anyone help outline the pros and cons?
>>
>> Thanks,
>>
>>
> 

Re: TTL, Versions and storing long history

Posted by Jonathan Gray <jl...@streamy.com>.
Jon,

Prior to 0.20, I would definitely recommend moving the time component to 
the keys, columns, and values.  Even after 0.20, I recommend doing that 
if you want complete control.  My personal philosophy is that versions 
are for versioning, and if you are really using them as a time dimension 
of individual data points, you should consider not using versions.

However, the API and server-side implementation for versions is greatly 
improved.  You can specify stamps manually and you can query for any 
range you want, gets and scans.

There is not currently a way to keep versions < x weeks old but always 
keep the latest version.  If you wanted to enforce something like that, 
you could always write a MapReduce job that ran periodically and 
enforced what you wanted.

If you want to keep history forever, the idea is to use the "big enough" 
values.  In practice, only since HBase 0.20 have we been able to handle 
millions of versions of a single column (Integer.MAX_VALUE is >2 
billion, far beyond the capabilities of HBase).  The same goes for 
TTL... 2 billion seconds is over 60 years.  Could also move everything 
to Long which would ensure there would never be an issue.  Will dig more 
and let you know.

In any case, you'll need 0.20 to fully take advantage of versions.

Hope that helps.

JG

Jon Schutz wrote:
> How do TTL and Versions specifications interact?  I'm guessing that the
> first limit reached applies, i.e. if TTL is 1 week and versions is 3,
> adding a fourth update to a data record would cause the first to be
> bumped even if it is less than a week old?  And if I only have 2
> versions but one is 2 weeks old, the expired one gets bumped even though
> the versions limit has not been reached?
> 
> Is there a way to say "Keep versions < x weeks old, but always keep at
> least the latest version, no matter how old?"
> 
> Suppose I want to keep the history about a particular object forever.
> Looks like TTL can be set to 'Forever' (-1) but Versions has no
> 'infinite' setting - I guess that's OK as in practice MAXINT is "big
> enough".  Would it be wise to use Hbase like this to maintain a history,
> or should I be adding a time component into the key and storing multiple
> records?  Can anyone help outline the pros and cons?
> 
> Thanks,
> 
>