You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ben West <bw...@yahoo.com> on 2011/10/19 21:18:38 UTC

Custom timestamps

Hi all,

We're storing timestamped data in HBase; from lurking on the mailing list it seems like the recommendation is usually to make the timestamp part of the row key. I'm curious why this is - is scanning over rows more efficient than scanning over timestamps within a cell? 

The book says: "the version timestamp is internally by HBase for things like time-to-live calculations. It's usually best to avoid setting this timestamp yourself. Prefer using a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, or both." I understand that TTL would be ruined (or saved, depending on your goal) by custom timestamps, and I also gather that the way HBase handles concurrency is through MVCC. But we are using application level locks, and HBase's TTL functionality applying is a bonus if anything.

So is there any reason why we shouldn't set the timestamps manually?

Thanks!
-Ben

Re: Custom timestamps

Posted by Doug Meil <do...@explorysmedical.com>.
Stack, he might be referring to this...

http://hbase.apache.org/book.html#versions

... I updated this recently based on an exchange with JD and somebody else.





On 10/21/11 1:08 AM, "Stuti Awasthi" <st...@hcl.com> wrote:

>Hi St. Ack,
>
>I read something while browsing . Right now don't have link but if I come
>across something similar , I will let you know. Thanks for info. It is
>really a big relief :)
>
>-----Original Message-----
>From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>Sent: Thursday, October 20, 2011 9:27 PM
>To: user@hbase.apache.org
>Subject: Re: Custom timestamps
>
>On Wed, Oct 19, 2011 at 9:59 PM, Stuti Awasthi <st...@hcl.com>
>wrote:
>> Hi St. Ack , Ben
>>
>> I also have a scenario that in my case I have to take periodical backup
>>of Hbase data. For that I have will be using export/import tool. I have
>>decided to take backup based on time range interval. I have read it in
>>some other posts also that it is not good idea for one to use timestamp
>>field of Hbase.
>
>What you are doing sounds fine to me.  Which posts say it bad so I can
>see what issues in particular they are referring to.
>Thanks,
>St.Ack
>
>::DISCLAIMER::
>--------------------------------------------------------------------------
>---------------------------------------------
>
>The contents of this e-mail and any attachment(s) are confidential and
>intended for the named recipient(s) only.
>It shall not attach any liability on the originator or HCL or its
>affiliates. Any views or opinions presented in
>this email are solely those of the author and may not necessarily reflect
>the opinions of HCL or its affiliates.
>Any form of reproduction, dissemination, copying, disclosure,
>modification, distribution and / or publication of
>this message without the prior written consent of the author of this
>e-mail is strictly prohibited. If you have
>received this email in error please delete it and notify the sender
>immediately. Before opening any mail and
>attachments please check them for viruses and defect.
>
>--------------------------------------------------------------------------
>---------------------------------------------


RE: Custom timestamps

Posted by Stuti Awasthi <st...@hcl.com>.
Hi St. Ack,

I read something while browsing . Right now don't have link but if I come across something similar , I will let you know. Thanks for info. It is really a big relief :)

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, October 20, 2011 9:27 PM
To: user@hbase.apache.org
Subject: Re: Custom timestamps

On Wed, Oct 19, 2011 at 9:59 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi St. Ack , Ben
>
> I also have a scenario that in my case I have to take periodical backup of Hbase data. For that I have will be using export/import tool. I have decided to take backup based on time range interval. I have read it in some other posts also that it is not good idea for one to use timestamp field of Hbase.

What you are doing sounds fine to me.  Which posts say it bad so I can see what issues in particular they are referring to.
Thanks,
St.Ack

::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Re: Custom timestamps

Posted by Stack <st...@duboce.net>.
On Wed, Oct 19, 2011 at 9:59 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi St. Ack , Ben
>
> I also have a scenario that in my case I have to take periodical backup of Hbase data. For that I have will be using export/import tool. I have decided to take backup based on time range interval. I have read it in some other posts also that it is not good idea for one to use timestamp field of Hbase.

What you are doing sounds fine to me.  Which posts say it bad so I can
see what issues in particular they are referring to.
Thanks,
St.Ack

RE: Custom timestamps

Posted by Stuti Awasthi <st...@hcl.com>.
Hi St. Ack , Ben

I also have a scenario that in my case I have to take periodical backup of Hbase data. For that I have will be using export/import tool. I have decided to take backup based on time range interval. I have read it in some other posts also that it is not good idea for one to use timestamp field of Hbase.
Till now my POC works fine at my end but I want to know that if I put same scenario in production will there be any issues to worry ?

Please Suggest.

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, October 20, 2011 3:54 AM
To: user@hbase.apache.org; Ben West
Subject: Re: Custom timestamps

On Wed, Oct 19, 2011 at 12:18 PM, Ben West <bw...@yahoo.com> wrote:
> We're storing timestamped data in HBase; from lurking on the mailing list it seems like the recommendation is usually to make the timestamp part of the row key. I'm curious why this is - is scanning over rows more efficient than scanning over timestamps within a cell?
>

I'd be surprised if a noticeable difference.

It depends on how you are to access the data.  In the tsdb case for instance, it wants to get all metrics within a particular time range.
If the timestamp it used were that of the hbase system, then you'd have to do a full table scan each time to find metrics that had been fired during a particular time period -- i.e. you'd check each row and see if any entries on the row for the time period you are interested in -- whereas if the timestamp part of the row key, you instead just have to start scanning at the opening of the time range you are querying about.


> The book says: "the version timestamp is internally by HBase for things like time-to-live calculations. It's usually best to avoid setting this timestamp yourself. Prefer using a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, or both." I understand that TTL would be ruined (or saved, depending on your goal) by custom timestamps, and I also gather that the way HBase handles concurrency is through MVCC. But we are using application level locks, and HBase's TTL functionality applying is a bonus if anything.
>

The books advice errs on the side of being conservative I'd say.

The MVCC that we do internally does not use the cell timestamp but instead a different running sequence number that is associated
(internally) with cells (I've not heard of an application atop hbase using the hbase timestamps to do MVCC at the application level).

The locks you talk of, are these the locks provided in hbase HTable API?  If so, are you aware they are dangerous (see back in this mailing list for explaination)?

> So is there any reason why we shouldn't set the timestamps manually?
>

Generally, hbase works fine with user set timestamps; there can be issues ordering edits if clients have divergent clocks and the version being set is time-based but I'm probably not telling you something you don't already know.

St.Ack

::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Re: Custom timestamps

Posted by Stack <st...@duboce.net>.
On Thu, Oct 20, 2011 at 8:11 AM, Ben West <bw...@yahoo.com> wrote:
> Actually, another question: are there issues with multiple puts having the same timestamp? I.e. I write a value with timestamp = today 12:00. I later change my mind and want to rewrite a different value but with the same timestamp. Would that present problems?
>

When you multiput, all get same timestamp.

There is no notion of 'changing' values in hbase.  You just add a new
version.  When hbase returns you the values, it will return them
ordered by timestamp (if the new timestamp is ahead of 12:00, it will
come out first.... else afterward).

St.Ack

Re: Custom timestamps

Posted by Ben West <bw...@yahoo.com>.
Actually, another question: are there issues with multiple puts having the same timestamp? I.e. I write a value with timestamp = today 12:00. I later change my mind and want to rewrite a different value but with the same timestamp. Would that present problems?

Thanks!
-Ben


----- Original Message -----
From: Ben West <bw...@yahoo.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: 
Sent: Thursday, October 20, 2011 9:13 AM
Subject: Re: Custom timestamps

Thanks Stack. We are indeed using locks outside of HBase, but I hadn't heard about the problems with HBase's locks. Good to know.

-Ben


----- Original Message -----
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; Ben West <bw...@yahoo.com>
Cc: 
Sent: Wednesday, October 19, 2011 5:24 PM
Subject: Re: Custom timestamps

On Wed, Oct 19, 2011 at 12:18 PM, Ben West <bw...@yahoo.com> wrote:
> We're storing timestamped data in HBase; from lurking on the mailing list it seems like the recommendation is usually to make the timestamp part of the row key. I'm curious why this is - is scanning over rows more efficient than scanning over timestamps within a cell?
>

I'd be surprised if a noticeable difference.

It depends on how you are to access the data.  In the tsdb case for
instance, it wants to get all metrics within a particular time range.
If the timestamp it used were that of the hbase system, then you'd
have to do a full table scan each time to find metrics that had been
fired during a particular time period -- i.e. you'd check each row and
see if any entries on the row for the time period you are interested
in -- whereas if the timestamp part of the row key, you instead just
have to start scanning at the opening of the time range you are
querying about.


> The book says: "the version timestamp is internally by HBase for things like time-to-live calculations. It's usually best to avoid setting this timestamp yourself. Prefer using a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, or both." I understand that TTL would be ruined (or saved, depending on your goal) by custom timestamps, and I also gather that the way HBase handles concurrency is through MVCC. But we are using application level locks, and HBase's TTL functionality applying is a bonus if anything.
>

The books advice errs on the side of being conservative I'd say.

The MVCC that we do internally does not use the cell timestamp but
instead a different running sequence number that is associated
(internally) with cells (I've not heard of an application atop hbase
using the hbase timestamps to do MVCC at the application level).

The locks you talk of, are these the locks provided in hbase HTable
API?  If so, are you aware they are dangerous (see back in this
mailing list for explaination)?

> So is there any reason why we shouldn't set the timestamps manually?
>

Generally, hbase works fine with user set timestamps; there can be
issues ordering edits if clients have divergent clocks and the version
being set is time-based but I'm probably not telling you something you
don't already know.

St.Ack

Re: Custom timestamps

Posted by Ben West <bw...@yahoo.com>.
Thanks Stack. We are indeed using locks outside of HBase, but I hadn't heard about the problems with HBase's locks. Good to know.

-Ben


----- Original Message -----
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; Ben West <bw...@yahoo.com>
Cc: 
Sent: Wednesday, October 19, 2011 5:24 PM
Subject: Re: Custom timestamps

On Wed, Oct 19, 2011 at 12:18 PM, Ben West <bw...@yahoo.com> wrote:
> We're storing timestamped data in HBase; from lurking on the mailing list it seems like the recommendation is usually to make the timestamp part of the row key. I'm curious why this is - is scanning over rows more efficient than scanning over timestamps within a cell?
>

I'd be surprised if a noticeable difference.

It depends on how you are to access the data.  In the tsdb case for
instance, it wants to get all metrics within a particular time range.
If the timestamp it used were that of the hbase system, then you'd
have to do a full table scan each time to find metrics that had been
fired during a particular time period -- i.e. you'd check each row and
see if any entries on the row for the time period you are interested
in -- whereas if the timestamp part of the row key, you instead just
have to start scanning at the opening of the time range you are
querying about.


> The book says: "the version timestamp is internally by HBase for things like time-to-live calculations. It's usually best to avoid setting this timestamp yourself. Prefer using a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, or both." I understand that TTL would be ruined (or saved, depending on your goal) by custom timestamps, and I also gather that the way HBase handles concurrency is through MVCC. But we are using application level locks, and HBase's TTL functionality applying is a bonus if anything.
>

The books advice errs on the side of being conservative I'd say.

The MVCC that we do internally does not use the cell timestamp but
instead a different running sequence number that is associated
(internally) with cells (I've not heard of an application atop hbase
using the hbase timestamps to do MVCC at the application level).

The locks you talk of, are these the locks provided in hbase HTable
API?  If so, are you aware they are dangerous (see back in this
mailing list for explaination)?

> So is there any reason why we shouldn't set the timestamps manually?
>

Generally, hbase works fine with user set timestamps; there can be
issues ordering edits if clients have divergent clocks and the version
being set is time-based but I'm probably not telling you something you
don't already know.

St.Ack


Re: Custom timestamps

Posted by Stack <st...@duboce.net>.
On Wed, Oct 19, 2011 at 12:18 PM, Ben West <bw...@yahoo.com> wrote:
> We're storing timestamped data in HBase; from lurking on the mailing list it seems like the recommendation is usually to make the timestamp part of the row key. I'm curious why this is - is scanning over rows more efficient than scanning over timestamps within a cell?
>

I'd be surprised if a noticeable difference.

It depends on how you are to access the data.  In the tsdb case for
instance, it wants to get all metrics within a particular time range.
If the timestamp it used were that of the hbase system, then you'd
have to do a full table scan each time to find metrics that had been
fired during a particular time period -- i.e. you'd check each row and
see if any entries on the row for the time period you are interested
in -- whereas if the timestamp part of the row key, you instead just
have to start scanning at the opening of the time range you are
querying about.


> The book says: "the version timestamp is internally by HBase for things like time-to-live calculations. It's usually best to avoid setting this timestamp yourself. Prefer using a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, or both." I understand that TTL would be ruined (or saved, depending on your goal) by custom timestamps, and I also gather that the way HBase handles concurrency is through MVCC. But we are using application level locks, and HBase's TTL functionality applying is a bonus if anything.
>

The books advice errs on the side of being conservative I'd say.

The MVCC that we do internally does not use the cell timestamp but
instead a different running sequence number that is associated
(internally) with cells (I've not heard of an application atop hbase
using the hbase timestamps to do MVCC at the application level).

The locks you talk of, are these the locks provided in hbase HTable
API?  If so, are you aware they are dangerous (see back in this
mailing list for explaination)?

> So is there any reason why we shouldn't set the timestamps manually?
>

Generally, hbase works fine with user set timestamps; there can be
issues ordering edits if clients have divergent clocks and the version
being set is time-based but I'm probably not telling you something you
don't already know.

St.Ack