You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Wes Chow <we...@s7labs.com> on 2009/04/01 15:56:15 UTC

timestamp uses

So far, few if any of the schema designs I've come across have really 
talked about using the timestamp field and HBase's automatic deletion of 
old cells in a smart way.

What is the timestamp typically used for? Snapshotting? Implementing 
more complicated transactions than HBase natively supports?


Wes

Re: timestamp uses

Posted by Erik Holstad <er...@gmail.com>.

Hi Genady!
If everything goes as planned there will be a possibility to input a
TimeRange into every get query in 0.20, so that you will
be able to do the call, give me all data from row r, family f and column c
in the timerange t2 to t1. The nice thing about the
new implementation is also that you will not have to go through all the
storefiles when you get to the storefiles with older
data than t1 etc, so the query is also going to be faster than before.

Regards Erik

Re: timestamp uses

Posted by Bradford Cross <br...@gmail.com>.

I have another thread in progress re using HBase as a financial time series
database.

http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200904.mbox/%3Cea7d6a710904011948l2a79bf18hfbc7a6102676b5f3@mail.gmail.com%3E



On Fri, Apr 3, 2009 at 8:38 AM, Jim Kellerman (POWERSET) <
Jim.Kellerman@microsoft.com> wrote:

> There are a number of Jiras open to address this issue.
> See HBASE-33, HBASE-52 and HBASE-1182
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
> > -----Original Message-----
> > From: Genady [mailto:genadyg@exelate.com]
> > Sent: Friday, April 03, 2009 2:56 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: timestamp uses
> >
> > Jonathan,
> >
> > Please correct me If I wrong, but one of the features that HBase
> obviously
> > missing is possibility to select records based on timestamp range(week,
> > month, etc.), as far as understand, it's possible to make select with
> > specified timestamps, but in a most cases you want to select ranges. To
> > solve it there is always option to put time/date as row key, but in most
> > designs you can't do it.
> >
> > Thanks,
> > Gennady
> >
> >
> > -----Original Message-----
> > From: Jonathan Gray [mailto:jlist@streamy.com]
> > Sent: Wednesday, April 01, 2009 7:55 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: timestamp uses
> >
> > Wes,
> >
> > The timestamp is used for versioning.
> >
> > There have been arguments recently around 0.20 changes regarding whether
> > the
> > user should be allowed to manually set this stamp or it is always
> > generated
> > server-side according to NOW.
> >
> > Currently the decision has been made to allow the user to manually set
> the
> > stamp on insertion, to any stamp at or before now (but not in the
> future).
> > This is so we can ensure when doing a flush that no entries in the
> > storefile
> > will have a stamp that is later than the flush stamp.
> >
> > In the canonical use case for HBase, web crawling, timestamps are used to
> > version and date each crawl.  You could then set HBase to keep the 10
> most
> > recent versions and older ones would be deleted on major compactions.
> >
> > At the other extreme, you could set the timestamp then each individual
> > column in a family could be a time-ordered list of whatever you want.  In
> > practice, however, I've found that it makes more sense to encode stamps
> in
> > your row keys or column names.
> >
> > Hope that helps.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Wes Chow [mailto:wes.chow@s7labs.com]
> > > Sent: Wednesday, April 01, 2009 5:56 AM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: timestamp uses
> > >
> > >
> > > So far, few if any of the schema designs I've come across have really
> > > talked about using the timestamp field and HBase's automatic deletion
> > > of
> > > old cells in a smart way.
> > >
> > > What is the timestamp typically used for? Snapshotting? Implementing
> > > more complicated transactions than HBase natively supports?
> > >
> > >
> > > Wes
> >
> >
>
>

RE: timestamp uses

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.

There are a number of Jiras open to address this issue.
See HBASE-33, HBASE-52 and HBASE-1182

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)

> -----Original Message-----
> From: Genady [mailto:genadyg@exelate.com]
> Sent: Friday, April 03, 2009 2:56 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: timestamp uses
> 
> Jonathan,
> 
> Please correct me If I wrong, but one of the features that HBase obviously
> missing is possibility to select records based on timestamp range(week,
> month, etc.), as far as understand, it's possible to make select with
> specified timestamps, but in a most cases you want to select ranges. To
> solve it there is always option to put time/date as row key, but in most
> designs you can't do it.
> 
> Thanks,
> Gennady
> 
> 
> -----Original Message-----
> From: Jonathan Gray [mailto:jlist@streamy.com]
> Sent: Wednesday, April 01, 2009 7:55 PM
> To: hbase-user@hadoop.apache.org
> Subject: RE: timestamp uses
> 
> Wes,
> 
> The timestamp is used for versioning.
> 
> There have been arguments recently around 0.20 changes regarding whether
> the
> user should be allowed to manually set this stamp or it is always
> generated
> server-side according to NOW.
> 
> Currently the decision has been made to allow the user to manually set the
> stamp on insertion, to any stamp at or before now (but not in the future).
> This is so we can ensure when doing a flush that no entries in the
> storefile
> will have a stamp that is later than the flush stamp.
> 
> In the canonical use case for HBase, web crawling, timestamps are used to
> version and date each crawl.  You could then set HBase to keep the 10 most
> recent versions and older ones would be deleted on major compactions.
> 
> At the other extreme, you could set the timestamp then each individual
> column in a family could be a time-ordered list of whatever you want.  In
> practice, however, I've found that it makes more sense to encode stamps in
> your row keys or column names.
> 
> Hope that helps.
> 
> JG
> 
> > -----Original Message-----
> > From: Wes Chow [mailto:wes.chow@s7labs.com]
> > Sent: Wednesday, April 01, 2009 5:56 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: timestamp uses
> >
> >
> > So far, few if any of the schema designs I've come across have really
> > talked about using the timestamp field and HBase's automatic deletion
> > of
> > old cells in a smart way.
> >
> > What is the timestamp typically used for? Snapshotting? Implementing
> > more complicated transactions than HBase natively supports?
> >
> >
> > Wes
> 
>

RE: timestamp uses

Posted by Genady <ge...@exelate.com>.

Jonathan,

Please correct me If I wrong, but one of the features that HBase obviously
missing is possibility to select records based on timestamp range(week,
month, etc.), as far as understand, it's possible to make select with
specified timestamps, but in a most cases you want to select ranges. To
solve it there is always option to put time/date as row key, but in most
designs you can't do it.

Thanks,
Gennady


-----Original Message-----
From: Jonathan Gray [mailto:jlist@streamy.com] 
Sent: Wednesday, April 01, 2009 7:55 PM
To: hbase-user@hadoop.apache.org
Subject: RE: timestamp uses

Wes,

The timestamp is used for versioning.

There have been arguments recently around 0.20 changes regarding whether the
user should be allowed to manually set this stamp or it is always generated
server-side according to NOW.

Currently the decision has been made to allow the user to manually set the
stamp on insertion, to any stamp at or before now (but not in the future).
This is so we can ensure when doing a flush that no entries in the storefile
will have a stamp that is later than the flush stamp.

In the canonical use case for HBase, web crawling, timestamps are used to
version and date each crawl.  You could then set HBase to keep the 10 most
recent versions and older ones would be deleted on major compactions.

At the other extreme, you could set the timestamp then each individual
column in a family could be a time-ordered list of whatever you want.  In
practice, however, I've found that it makes more sense to encode stamps in
your row keys or column names.

Hope that helps.

JG

> -----Original Message-----
> From: Wes Chow [mailto:wes.chow@s7labs.com]
> Sent: Wednesday, April 01, 2009 5:56 AM
> To: hbase-user@hadoop.apache.org
> Subject: timestamp uses
> 
> 
> So far, few if any of the schema designs I've come across have really
> talked about using the timestamp field and HBase's automatic deletion
> of
> old cells in a smart way.
> 
> What is the timestamp typically used for? Snapshotting? Implementing
> more complicated transactions than HBase natively supports?
> 
> 
> Wes

RE: timestamp uses

Posted by Jonathan Gray <jl...@streamy.com>.

Wes,

The timestamp is used for versioning.

There have been arguments recently around 0.20 changes regarding whether the
user should be allowed to manually set this stamp or it is always generated
server-side according to NOW.

Currently the decision has been made to allow the user to manually set the
stamp on insertion, to any stamp at or before now (but not in the future).
This is so we can ensure when doing a flush that no entries in the storefile
will have a stamp that is later than the flush stamp.

In the canonical use case for HBase, web crawling, timestamps are used to
version and date each crawl.  You could then set HBase to keep the 10 most
recent versions and older ones would be deleted on major compactions.

At the other extreme, you could set the timestamp then each individual
column in a family could be a time-ordered list of whatever you want.  In
practice, however, I've found that it makes more sense to encode stamps in
your row keys or column names.

Hope that helps.

JG

> -----Original Message-----
> From: Wes Chow [mailto:wes.chow@s7labs.com]
> Sent: Wednesday, April 01, 2009 5:56 AM
> To: hbase-user@hadoop.apache.org
> Subject: timestamp uses
> 
> 
> So far, few if any of the schema designs I've come across have really
> talked about using the timestamp field and HBase's automatic deletion
> of
> old cells in a smart way.
> 
> What is the timestamp typically used for? Snapshotting? Implementing
> more complicated transactions than HBase natively supports?
> 
> 
> Wes