You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Qingyan(Evan) Liu" <qi...@gmail.com> on 2009/07/13 20:08:53 UTC

slides about some case studies of hbase table schema design

Dears,

I've just finished some slides about some cases of designing hbase
table schemas. Please have a look here:
http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies

The cases are mostly collected from websites. The reason I did this is
because I found there're no good guides for me to say how to design
hbase table schemas.

Any suggestions or best practices that you wanna share are welcomed!

Thanks a lot!

sincerely,
Evan

Re: slides about some case studies of hbase table schema design

Posted by "Qingyan(Evan) Liu" <qi...@gmail.com>.

Hi Jonathan,

What we need is ascending order. The problem that Chubert pointed out
is that when we insert time series data into the htable, only the last
tablet is busy, this is slower than inserting to serveral tablets
stimutanously.

sincerely,
Evan

2009/7/17 Jonathan Gray <jl...@streamy.com>:
> If you want time descending order, just use Long.MAX_VALUE - stamp.
>
> When reading the value you will have to take Long.MAX_VALUE - stored_value =
> stamp;
>
> JG
>
> Qingyan(Evan) Liu wrote:
>>
>> hi chubert,
>>
>> your comment is really valuable. I'm considering about how to leverage
>> performance and analysis. Most of the time, we will scan over a series
>> of time, for example, to count the distinct access IPs for the last
>> month. (i.e. most of the tasks are time series analysis.) If the keys
>> are sorted by time, then the analysis will be performed easily. If the
>> keys are partitioned by <userid>, and mostly it's hard to iterate all
>> userids, then it's harder to perform the same statistics.
>>
>> So.... I'm trying to find out a better solution. And welcome for any
>> suggestions. Thanks.
>>
>> sincerely,
>> Evan
>>
>> 2009/7/14 zsongbo <zs...@gmail.com>:
>>>
>>> Hi Qingyuan(Evan),
>>>
>>> In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
>>> rowkey.
>>>
>>> I think there is a problem: Since the accesslog events are generated by
>>> time
>>> sequence, the rowkey will be in Ascending sequence. Then when we
>>> insert/load
>>> the accesslog into HBase, there will be only one/the last Tablet are
>>> busy.
>>> Thus, the load is not balance.
>>>
>>> It may be diffcult to design the schema of HBase for accesslog, because
>>> it
>>> depend the applications very much.
>>>
>>> RowKey=<userid><time> may be a choice.
>>>
>>> Schubert Zhang
>>>
>>> On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
>>>
>>>> Please add a link to the below either to presentations or articles up on
>>>> the
>>>> hbase wiki.
>>>>
>>>> Thanks for the excellent contribution filling a hole we have had in our
>>>> documentation with a while now.
>>>>
>>>> Yours,
>>>> St.Ack
>>>>
>>>> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu
>>>> <qingyan123@gmail.com
>>>>>
>>>>> wrote:
>>>>> Dears,
>>>>>
>>>>> I've just finished some slides about some cases of designing hbase
>>>>> table schemas. Please have a look here:
>>>>>
>>>>
>>>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>>>>>
>>>>> The cases are mostly collected from websites. The reason I did this is
>>>>> because I found there're no good guides for me to say how to design
>>>>> hbase table schemas.
>>>>>
>>>>> Any suggestions or best practices that you wanna share are welcomed!
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> sincerely,
>>>>> Evan
>>>>>
>>
>

Re: slides about some case studies of hbase table schema design

Posted by Jonathan Gray <jl...@streamy.com>.

If you want time descending order, just use Long.MAX_VALUE - stamp.

When reading the value you will have to take Long.MAX_VALUE - 
stored_value = stamp;

JG

Qingyan(Evan) Liu wrote:
> hi chubert,
> 
> your comment is really valuable. I'm considering about how to leverage
> performance and analysis. Most of the time, we will scan over a series
> of time, for example, to count the distinct access IPs for the last
> month. (i.e. most of the tasks are time series analysis.) If the keys
> are sorted by time, then the analysis will be performed easily. If the
> keys are partitioned by <userid>, and mostly it's hard to iterate all
> userids, then it's harder to perform the same statistics.
> 
> So.... I'm trying to find out a better solution. And welcome for any
> suggestions. Thanks.
> 
> sincerely,
> Evan
> 
> 2009/7/14 zsongbo <zs...@gmail.com>:
>> Hi Qingyuan(Evan),
>>
>> In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
>> rowkey.
>>
>> I think there is a problem: Since the accesslog events are generated by time
>> sequence, the rowkey will be in Ascending sequence. Then when we insert/load
>> the accesslog into HBase, there will be only one/the last Tablet are busy.
>> Thus, the load is not balance.
>>
>> It may be diffcult to design the schema of HBase for accesslog, because it
>> depend the applications very much.
>>
>> RowKey=<userid><time> may be a choice.
>>
>> Schubert Zhang
>>
>> On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
>>
>>> Please add a link to the below either to presentations or articles up on
>>> the
>>> hbase wiki.
>>>
>>> Thanks for the excellent contribution filling a hole we have had in our
>>> documentation with a while now.
>>>
>>> Yours,
>>> St.Ack
>>>
>>> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qingyan123@gmail.com
>>>> wrote:
>>>> Dears,
>>>>
>>>> I've just finished some slides about some cases of designing hbase
>>>> table schemas. Please have a look here:
>>>>
>>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>>>> The cases are mostly collected from websites. The reason I did this is
>>>> because I found there're no good guides for me to say how to design
>>>> hbase table schemas.
>>>>
>>>> Any suggestions or best practices that you wanna share are welcomed!
>>>>
>>>> Thanks a lot!
>>>>
>>>> sincerely,
>>>> Evan
>>>>
>

Re: slides about some case studies of hbase table schema design

Posted by "Qingyan(Evan) Liu" <qi...@gmail.com>.

hi chubert,

your comment is really valuable. I'm considering about how to leverage
performance and analysis. Most of the time, we will scan over a series
of time, for example, to count the distinct access IPs for the last
month. (i.e. most of the tasks are time series analysis.) If the keys
are sorted by time, then the analysis will be performed easily. If the
keys are partitioned by <userid>, and mostly it's hard to iterate all
userids, then it's harder to perform the same statistics.

So.... I'm trying to find out a better solution. And welcome for any
suggestions. Thanks.

sincerely,
Evan

2009/7/14 zsongbo <zs...@gmail.com>:
> Hi Qingyuan(Evan),
>
> In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
> rowkey.
>
> I think there is a problem: Since the accesslog events are generated by time
> sequence, the rowkey will be in Ascending sequence. Then when we insert/load
> the accesslog into HBase, there will be only one/the last Tablet are busy.
> Thus, the load is not balance.
>
> It may be diffcult to design the schema of HBase for accesslog, because it
> depend the applications very much.
>
> RowKey=<userid><time> may be a choice.
>
> Schubert Zhang
>
> On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
>
>> Please add a link to the below either to presentations or articles up on
>> the
>> hbase wiki.
>>
>> Thanks for the excellent contribution filling a hole we have had in our
>> documentation with a while now.
>>
>> Yours,
>> St.Ack
>>
>> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qingyan123@gmail.com
>> >wrote:
>>
>> > Dears,
>> >
>> > I've just finished some slides about some cases of designing hbase
>> > table schemas. Please have a look here:
>> >
>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>> >
>> > The cases are mostly collected from websites. The reason I did this is
>> > because I found there're no good guides for me to say how to design
>> > hbase table schemas.
>> >
>> > Any suggestions or best practices that you wanna share are welcomed!
>> >
>> > Thanks a lot!
>> >
>> > sincerely,
>> > Evan
>> >
>>
>

Re: slides about some case studies of hbase table schema design

Posted by zsongbo <zs...@gmail.com>.

Hi Qingyuan(Evan),

In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
rowkey.

I think there is a problem: Since the accesslog events are generated by time
sequence, the rowkey will be in Ascending sequence. Then when we insert/load
the accesslog into HBase, there will be only one/the last Tablet are busy.
Thus, the load is not balance.

It may be diffcult to design the schema of HBase for accesslog, because it
depend the applications very much.

RowKey=<userid><time> may be a choice.

Schubert Zhang

On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:

> Please add a link to the below either to presentations or articles up on
> the
> hbase wiki.
>
> Thanks for the excellent contribution filling a hole we have had in our
> documentation with a while now.
>
> Yours,
> St.Ack
>
> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qingyan123@gmail.com
> >wrote:
>
> > Dears,
> >
> > I've just finished some slides about some cases of designing hbase
> > table schemas. Please have a look here:
> >
> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
> >
> > The cases are mostly collected from websites. The reason I did this is
> > because I found there're no good guides for me to say how to design
> > hbase table schemas.
> >
> > Any suggestions or best practices that you wanna share are welcomed!
> >
> > Thanks a lot!
> >
> > sincerely,
> > Evan
> >
>

Re: slides about some case studies of hbase table schema design

Posted by "Qingyan(Evan) Liu" <qi...@gmail.com>.

Hi stack,

I've added it to http://wiki.apache.org/hadoop/HBase/HBasePresentations

thanks.

sincerely,
Evan

2009/7/14 stack <st...@duboce.net>:
> Please add a link to the below either to presentations or articles up on the
> hbase wiki.
>
> Thanks for the excellent contribution filling a hole we have had in our
> documentation with a while now.
>
> Yours,
> St.Ack
>
> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qi...@gmail.com>wrote:
>
>> Dears,
>>
>> I've just finished some slides about some cases of designing hbase
>> table schemas. Please have a look here:
>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>>
>> The cases are mostly collected from websites. The reason I did this is
>> because I found there're no good guides for me to say how to design
>> hbase table schemas.
>>
>> Any suggestions or best practices that you wanna share are welcomed!
>>
>> Thanks a lot!
>>
>> sincerely,
>> Evan
>>
>

Re: slides about some case studies of hbase table schema design

Posted by stack <st...@duboce.net>.

Please add a link to the below either to presentations or articles up on the
hbase wiki.

Thanks for the excellent contribution filling a hole we have had in our
documentation with a while now.

Yours,
St.Ack

On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qi...@gmail.com>wrote:

> Dears,
>
> I've just finished some slides about some cases of designing hbase
> table schemas. Please have a look here:
> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>
> The cases are mostly collected from websites. The reason I did this is
> because I found there're no good guides for me to say how to design
> hbase table schemas.
>
> Any suggestions or best practices that you wanna share are welcomed!
>
> Thanks a lot!
>
> sincerely,
> Evan
>