You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "S. Zhou" <my...@yahoo.com> on 2014/03/14 21:45:05 UTC

HBase timestamp consistency aross multiple region servers?

Here is what I am trying to figure out: in the same table,  if cell A is updated after cell B, is it guaranteed that the time stamp of cell A is always bigger than the time stamp of cell B, even cell A and cell B could be stored on different machines (therefore these two machines might out of sync on time)?

The reason I am asking this question is: I want to use time stamp to order the updates by time. These updates are issued from multiple machines. I was thinking to use global counter (stored in a separated HBase table) but I guess that counter table might become a hot spot since each update needs to update this table.

My general problem is: I want to sort the updates stored in Hbase from multiple machines. Please let me know if you have good thoughts.

Thanks a lot
Senqiang

Re: HBase timestamp consistency aross multiple region servers?

Posted by lars hofhansl <la...@apache.org>.
In addition to what Ted and Andrew said...

If cell A is updated after cell B *and* the application does not set any timestamps *and* cell A and B are part of the same row (i.e. have the same row key) then the timestamp of cell A will be greater then cell B's. If cells A and B are not related by row (or an alternate split policy) there is no guarantee (outside of the precision you can get from NTP).

If you do not mind the plug... These blogs might help to clarify the issues:
http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html
http://hadoop-hbase.blogspot.com/2012/09/keyvalue-explicit-timestamps-vs.html
http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html

-- Lars




________________________________
 From: S. Zhou <my...@yahoo.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Friday, March 14, 2014 1:45 PM
Subject: HBase timestamp consistency aross multiple region servers?
 

Here is what I am trying to figure out: in the same table,  if cell A is updated after cell B, is it guaranteed that the time stamp of cell A is always bigger than the time stamp of cell B, even cell A and cell B could be stored on different machines (therefore these two machines might out of sync on time)?

The reason I am asking this question is: I want to use time stamp to order the updates by time. These updates are issued from multiple machines. I was thinking to use global counter (stored in a separated HBase table) but I guess that counter table might become a hot spot since each update needs to update this table.

My general problem is: I want to sort the updates stored in Hbase from multiple machines. Please let me know if you have good thoughts.

Thanks a lot
Senqiang

Re: HBase timestamp consistency aross multiple region servers?

Posted by Andrew Purtell <ap...@apache.org>.
This is kind of a Y answer to an X-Y question.

>
I want to use time stamp to order the updates by time. These updates
> are issued from multiple machines.

> I was thinking to use global counter (stored in a separated HBase
> table)
but I guess that counter table might become a hot spot since
> each update needs to update this table.

There are two possible answers to this question as posed.

1. You want HBase to order your updates by timestamp. This happens
naturally.

It is already strongly recommend that you run NTP on all of your HBase
servers as a matter of good distributed system hygiene.  If you don't
specify a specific timestamp in your mutations then HBase will use the
latest server time when persisting your values, and you will have updates
ordered by time.


2. You want to retrieve updates by timestamp. In other words, you don't
merely want HBase to order updates by time you also want to have a time
component as row key or part of a composite row key.

There are several schema design solutions to this. You can use Apache
Phoenix with salted keys. You can use Sematext's HBaseWD library. You can
use a separate distributed process for time ordered keys (strictly
speaking, k-ordered) such as Twitter's Snowflake. Choose one that looks
like it would work best for your use case.



On Fri, Mar 14, 2014 at 2:01 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. if cell A is updated after cell B, is it guaranteed that the time stamp
> of cell A is always bigger than the time stamp of cell B
>
> As you mentioned, machines might be out of sync on time, the above may not
> always be true.
>
>
> On Fri, Mar 14, 2014 at 1:45 PM, S. Zhou <my...@yahoo.com> wrote:
>
> > Here is what I am trying to figure out: in the same table,  if cell A is
> > updated after cell B, is it guaranteed that the time stamp of cell A is
> > always bigger than the time stamp of cell B, even cell A and cell B could
> > be stored on different machines (therefore these two machines might out
> of
> > sync on time)?
> >
> > The reason I am asking this question is
> : I want to use time stamp to order
> > the updates by time. These updates are issued from multiple machines. I
> was
> > thinking to use global counter (stored in a separated HBase table)
> but I
> > guess that counter table might become a hot spot since each update needs
> to
> > update this table.
> >
> > My general problem is: I want to sort the updates stored in Hbase from
> > multiple machines. Please let me know if you have good thoughts.
> >
> > Thanks a lot
> > Senqiang
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase timestamp consistency aross multiple region servers?

Posted by Ted Yu <yu...@gmail.com>.
bq. if cell A is updated after cell B, is it guaranteed that the time stamp
of cell A is always bigger than the time stamp of cell B

As you mentioned, machines might be out of sync on time, the above may not
always be true.


On Fri, Mar 14, 2014 at 1:45 PM, S. Zhou <my...@yahoo.com> wrote:

> Here is what I am trying to figure out: in the same table,  if cell A is
> updated after cell B, is it guaranteed that the time stamp of cell A is
> always bigger than the time stamp of cell B, even cell A and cell B could
> be stored on different machines (therefore these two machines might out of
> sync on time)?
>
> The reason I am asking this question is: I want to use time stamp to order
> the updates by time. These updates are issued from multiple machines. I was
> thinking to use global counter (stored in a separated HBase table) but I
> guess that counter table might become a hot spot since each update needs to
> update this table.
>
> My general problem is: I want to sort the updates stored in Hbase from
> multiple machines. Please let me know if you have good thoughts.
>
> Thanks a lot
> Senqiang
>