You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by edward choi <mp...@gmail.com> on 2011/07/22 09:18:29 UTC

Why use "Reverse Timestamp" as the Row Key?

Hi,
I was studying Hbase with "Hadoop: The Definitive Guide".
There was a schema example that had as the row key, "Group Id + Reverse
Timestamp."
This way the same groups will be located near one another in the table.
Plus, within the same group, rows will be sorted so that the most recently
inserted row will be located at the first.

The part I don't understand is, what is the advantage of using "Reverse
Timestamp" instead of just "Timestamp"?
Why place the newest row on the top?
I thought in Hbase, keys are searched by binary search. And in binary
search, the chronological order has no effect (at least that's how I
understand it).
So why put an extra step to reverse the timestamp?

Any explanation will be much appreciated.

Ed.

Re: Why use "Reverse Timestamp" as the Row Key?

Posted by Marc Sturlese <ma...@gmail.com>.
This is normally useful for lot's of web apps. Sort in Hbase is done at
insert time not when scanning. Using a reversed timestamp you ensure the
most recent activity of the user will be shown first. 

--
View this message in context: http://lucene.472066.n3.nabble.com/Why-use-Reverse-Timestamp-as-the-Row-Key-tp3190719p3190906.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Why use "Reverse Timestamp" as the Row Key?

Posted by Edward Choi <mp...@gmail.com>.
Thanks for the clear-up!!

Ed

On 2011. 7. 22., at 오후 11:03, Ted Yu <yu...@gmail.com> wrote:

> That's right.
> 
> On Fri, Jul 22, 2011 at 7:01 AM, edward choi <mp...@gmail.com> wrote:
> 
>> Thanks for the explanation.
>> 
>> So if I don't care whether the newest row is on the top when doing a Scan,
>> then I don't need to bother using Reverse Timestamp of the Row Key?
>> 
>> For example, I am collecting news articles on a daily basis.
>> And each article is stored in Hbase, "using YearMonthDate + Title Hash" as
>> the Row Key.
>> I don't care how the articles are sorted as long as they are grouped by
>> YearMonthDate.
>> In this case, I don't need Reverse Timestamp.
>> Am I right on this one?
>> 
>> Ed
>> 
>> 2011/7/22 Doug Meil <do...@explorysmedical.com>
>> 
>>> 
>>> It's so that you can get the most recent entry with a Scan.  Assuming
>> that
>>> the key-structure (as explained in the book) is something like
>>> <thing><rev-timestamp>.  And you are trying to quickly find the most
>>> recent entry for <thing>.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/22/11 3:18 AM, "edward choi" <mp...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> I was studying Hbase with "Hadoop: The Definitive Guide".
>>>> There was a schema example that had as the row key, "Group Id + Reverse
>>>> Timestamp."
>>>> This way the same groups will be located near one another in the table.
>>>> Plus, within the same group, rows will be sorted so that the most
>> recently
>>>> inserted row will be located at the first.
>>>> 
>>>> The part I don't understand is, what is the advantage of using "Reverse
>>>> Timestamp" instead of just "Timestamp"?
>>>> Why place the newest row on the top?
>>>> I thought in Hbase, keys are searched by binary search. And in binary
>>>> search, the chronological order has no effect (at least that's how I
>>>> understand it).
>>>> So why put an extra step to reverse the timestamp?
>>>> 
>>>> Any explanation will be much appreciated.
>>>> 
>>>> Ed.
>>> 
>>> 
>> 

Re: Why use "Reverse Timestamp" as the Row Key?

Posted by Ted Yu <yu...@gmail.com>.
That's right.

On Fri, Jul 22, 2011 at 7:01 AM, edward choi <mp...@gmail.com> wrote:

> Thanks for the explanation.
>
> So if I don't care whether the newest row is on the top when doing a Scan,
> then I don't need to bother using Reverse Timestamp of the Row Key?
>
> For example, I am collecting news articles on a daily basis.
> And each article is stored in Hbase, "using YearMonthDate + Title Hash" as
> the Row Key.
> I don't care how the articles are sorted as long as they are grouped by
> YearMonthDate.
> In this case, I don't need Reverse Timestamp.
> Am I right on this one?
>
> Ed
>
> 2011/7/22 Doug Meil <do...@explorysmedical.com>
>
> >
> > It's so that you can get the most recent entry with a Scan.  Assuming
> that
> > the key-structure (as explained in the book) is something like
> > <thing><rev-timestamp>.  And you are trying to quickly find the most
> > recent entry for <thing>.
> >
> >
> >
> >
> >
> >
> > On 7/22/11 3:18 AM, "edward choi" <mp...@gmail.com> wrote:
> >
> > >Hi,
> > >I was studying Hbase with "Hadoop: The Definitive Guide".
> > >There was a schema example that had as the row key, "Group Id + Reverse
> > >Timestamp."
> > >This way the same groups will be located near one another in the table.
> > >Plus, within the same group, rows will be sorted so that the most
> recently
> > >inserted row will be located at the first.
> > >
> > >The part I don't understand is, what is the advantage of using "Reverse
> > >Timestamp" instead of just "Timestamp"?
> > >Why place the newest row on the top?
> > >I thought in Hbase, keys are searched by binary search. And in binary
> > >search, the chronological order has no effect (at least that's how I
> > >understand it).
> > >So why put an extra step to reverse the timestamp?
> > >
> > >Any explanation will be much appreciated.
> > >
> > >Ed.
> >
> >
>

Re: Why use "Reverse Timestamp" as the Row Key?

Posted by edward choi <mp...@gmail.com>.
Thanks for the explanation.

So if I don't care whether the newest row is on the top when doing a Scan,
then I don't need to bother using Reverse Timestamp of the Row Key?

For example, I am collecting news articles on a daily basis.
And each article is stored in Hbase, "using YearMonthDate + Title Hash" as
the Row Key.
I don't care how the articles are sorted as long as they are grouped by
YearMonthDate.
In this case, I don't need Reverse Timestamp.
Am I right on this one?

Ed

2011/7/22 Doug Meil <do...@explorysmedical.com>

>
> It's so that you can get the most recent entry with a Scan.  Assuming that
> the key-structure (as explained in the book) is something like
> <thing><rev-timestamp>.  And you are trying to quickly find the most
> recent entry for <thing>.
>
>
>
>
>
>
> On 7/22/11 3:18 AM, "edward choi" <mp...@gmail.com> wrote:
>
> >Hi,
> >I was studying Hbase with "Hadoop: The Definitive Guide".
> >There was a schema example that had as the row key, "Group Id + Reverse
> >Timestamp."
> >This way the same groups will be located near one another in the table.
> >Plus, within the same group, rows will be sorted so that the most recently
> >inserted row will be located at the first.
> >
> >The part I don't understand is, what is the advantage of using "Reverse
> >Timestamp" instead of just "Timestamp"?
> >Why place the newest row on the top?
> >I thought in Hbase, keys are searched by binary search. And in binary
> >search, the chronological order has no effect (at least that's how I
> >understand it).
> >So why put an extra step to reverse the timestamp?
> >
> >Any explanation will be much appreciated.
> >
> >Ed.
>
>

Re: Why use "Reverse Timestamp" as the Row Key?

Posted by Doug Meil <do...@explorysmedical.com>.
It's so that you can get the most recent entry with a Scan.  Assuming that
the key-structure (as explained in the book) is something like
<thing><rev-timestamp>.  And you are trying to quickly find the most
recent entry for <thing>.






On 7/22/11 3:18 AM, "edward choi" <mp...@gmail.com> wrote:

>Hi,
>I was studying Hbase with "Hadoop: The Definitive Guide".
>There was a schema example that had as the row key, "Group Id + Reverse
>Timestamp."
>This way the same groups will be located near one another in the table.
>Plus, within the same group, rows will be sorted so that the most recently
>inserted row will be located at the first.
>
>The part I don't understand is, what is the advantage of using "Reverse
>Timestamp" instead of just "Timestamp"?
>Why place the newest row on the top?
>I thought in Hbase, keys are searched by binary search. And in binary
>search, the chronological order has no effect (at least that's how I
>understand it).
>So why put an extra step to reverse the timestamp?
>
>Any explanation will be much appreciated.
>
>Ed.