You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Qingyan(Evan) Liu" <qi...@gmail.com> on 2009/07/13 20:08:53 UTC
slides about some case studies of hbase table schema design
Dears,
I've just finished some slides about some cases of designing hbase
table schemas. Please have a look here:
http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
The cases are mostly collected from websites. The reason I did this is
because I found there're no good guides for me to say how to design
hbase table schemas.
Any suggestions or best practices that you wanna share are welcomed!
Thanks a lot!
sincerely,
Evan
Re: slides about some case studies of hbase table schema design
Posted by "Qingyan(Evan) Liu" <qi...@gmail.com>.
Hi Jonathan,
What we need is ascending order. The problem that Chubert pointed out
is that when we insert time series data into the htable, only the last
tablet is busy, this is slower than inserting to serveral tablets
stimutanously.
sincerely,
Evan
2009/7/17 Jonathan Gray <jl...@streamy.com>:
> If you want time descending order, just use Long.MAX_VALUE - stamp.
>
> When reading the value you will have to take Long.MAX_VALUE - stored_value =
> stamp;
>
> JG
>
> Qingyan(Evan) Liu wrote:
>>
>> hi chubert,
>>
>> your comment is really valuable. I'm considering about how to leverage
>> performance and analysis. Most of the time, we will scan over a series
>> of time, for example, to count the distinct access IPs for the last
>> month. (i.e. most of the tasks are time series analysis.) If the keys
>> are sorted by time, then the analysis will be performed easily. If the
>> keys are partitioned by <userid>, and mostly it's hard to iterate all
>> userids, then it's harder to perform the same statistics.
>>
>> So.... I'm trying to find out a better solution. And welcome for any
>> suggestions. Thanks.
>>
>> sincerely,
>> Evan
>>
>> 2009/7/14 zsongbo <zs...@gmail.com>:
>>>
>>> Hi Qingyuan(Evan),
>>>
>>> In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
>>> rowkey.
>>>
>>> I think there is a problem: Since the accesslog events are generated by
>>> time
>>> sequence, the rowkey will be in Ascending sequence. Then when we
>>> insert/load
>>> the accesslog into HBase, there will be only one/the last Tablet are
>>> busy.
>>> Thus, the load is not balance.
>>>
>>> It may be diffcult to design the schema of HBase for accesslog, because
>>> it
>>> depend the applications very much.
>>>
>>> RowKey=<userid><time> may be a choice.
>>>
>>> Schubert Zhang
>>>
>>> On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
>>>
>>>> Please add a link to the below either to presentations or articles up on
>>>> the
>>>> hbase wiki.
>>>>
>>>> Thanks for the excellent contribution filling a hole we have had in our
>>>> documentation with a while now.
>>>>
>>>> Yours,
>>>> St.Ack
>>>>
>>>> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu
>>>> <qingyan123@gmail.com
>>>>>
>>>>> wrote:
>>>>> Dears,
>>>>>
>>>>> I've just finished some slides about some cases of designing hbase
>>>>> table schemas. Please have a look here:
>>>>>
>>>>
>>>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>>>>>
>>>>> The cases are mostly collected from websites. The reason I did this is
>>>>> because I found there're no good guides for me to say how to design
>>>>> hbase table schemas.
>>>>>
>>>>> Any suggestions or best practices that you wanna share are welcomed!
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> sincerely,
>>>>> Evan
>>>>>
>>
>
Re: slides about some case studies of hbase table schema design
Posted by Jonathan Gray <jl...@streamy.com>.
If you want time descending order, just use Long.MAX_VALUE - stamp.
When reading the value you will have to take Long.MAX_VALUE -
stored_value = stamp;
JG
Qingyan(Evan) Liu wrote:
> hi chubert,
>
> your comment is really valuable. I'm considering about how to leverage
> performance and analysis. Most of the time, we will scan over a series
> of time, for example, to count the distinct access IPs for the last
> month. (i.e. most of the tasks are time series analysis.) If the keys
> are sorted by time, then the analysis will be performed easily. If the
> keys are partitioned by <userid>, and mostly it's hard to iterate all
> userids, then it's harder to perform the same statistics.
>
> So.... I'm trying to find out a better solution. And welcome for any
> suggestions. Thanks.
>
> sincerely,
> Evan
>
> 2009/7/14 zsongbo <zs...@gmail.com>:
>> Hi Qingyuan(Evan),
>>
>> In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
>> rowkey.
>>
>> I think there is a problem: Since the accesslog events are generated by time
>> sequence, the rowkey will be in Ascending sequence. Then when we insert/load
>> the accesslog into HBase, there will be only one/the last Tablet are busy.
>> Thus, the load is not balance.
>>
>> It may be diffcult to design the schema of HBase for accesslog, because it
>> depend the applications very much.
>>
>> RowKey=<userid><time> may be a choice.
>>
>> Schubert Zhang
>>
>> On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
>>
>>> Please add a link to the below either to presentations or articles up on
>>> the
>>> hbase wiki.
>>>
>>> Thanks for the excellent contribution filling a hole we have had in our
>>> documentation with a while now.
>>>
>>> Yours,
>>> St.Ack
>>>
>>> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qingyan123@gmail.com
>>>> wrote:
>>>> Dears,
>>>>
>>>> I've just finished some slides about some cases of designing hbase
>>>> table schemas. Please have a look here:
>>>>
>>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>>>> The cases are mostly collected from websites. The reason I did this is
>>>> because I found there're no good guides for me to say how to design
>>>> hbase table schemas.
>>>>
>>>> Any suggestions or best practices that you wanna share are welcomed!
>>>>
>>>> Thanks a lot!
>>>>
>>>> sincerely,
>>>> Evan
>>>>
>
Re: slides about some case studies of hbase table schema design
Posted by "Qingyan(Evan) Liu" <qi...@gmail.com>.
hi chubert,
your comment is really valuable. I'm considering about how to leverage
performance and analysis. Most of the time, we will scan over a series
of time, for example, to count the distinct access IPs for the last
month. (i.e. most of the tasks are time series analysis.) If the keys
are sorted by time, then the analysis will be performed easily. If the
keys are partitioned by <userid>, and mostly it's hard to iterate all
userids, then it's harder to perform the same statistics.
So.... I'm trying to find out a better solution. And welcome for any
suggestions. Thanks.
sincerely,
Evan
2009/7/14 zsongbo <zs...@gmail.com>:
> Hi Qingyuan(Evan),
>
> In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
> rowkey.
>
> I think there is a problem: Since the accesslog events are generated by time
> sequence, the rowkey will be in Ascending sequence. Then when we insert/load
> the accesslog into HBase, there will be only one/the last Tablet are busy.
> Thus, the load is not balance.
>
> It may be diffcult to design the schema of HBase for accesslog, because it
> depend the applications very much.
>
> RowKey=<userid><time> may be a choice.
>
> Schubert Zhang
>
> On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
>
>> Please add a link to the below either to presentations or articles up on
>> the
>> hbase wiki.
>>
>> Thanks for the excellent contribution filling a hole we have had in our
>> documentation with a while now.
>>
>> Yours,
>> St.Ack
>>
>> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qingyan123@gmail.com
>> >wrote:
>>
>> > Dears,
>> >
>> > I've just finished some slides about some cases of designing hbase
>> > table schemas. Please have a look here:
>> >
>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>> >
>> > The cases are mostly collected from websites. The reason I did this is
>> > because I found there're no good guides for me to say how to design
>> > hbase table schemas.
>> >
>> > Any suggestions or best practices that you wanna share are welcomed!
>> >
>> > Thanks a lot!
>> >
>> > sincerely,
>> > Evan
>> >
>>
>
Re: slides about some case studies of hbase table schema design
Posted by zsongbo <zs...@gmail.com>.
Hi Qingyuan(Evan),
In the slides, Case 5: access log, you use <time><INC_COUNTER> as the
rowkey.
I think there is a problem: Since the accesslog events are generated by time
sequence, the rowkey will be in Ascending sequence. Then when we insert/load
the accesslog into HBase, there will be only one/the last Tablet are busy.
Thus, the load is not balance.
It may be diffcult to design the schema of HBase for accesslog, because it
depend the applications very much.
RowKey=<userid><time> may be a choice.
Schubert Zhang
On Tue, Jul 14, 2009 at 2:23 AM, stack <st...@duboce.net> wrote:
> Please add a link to the below either to presentations or articles up on
> the
> hbase wiki.
>
> Thanks for the excellent contribution filling a hole we have had in our
> documentation with a while now.
>
> Yours,
> St.Ack
>
> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qingyan123@gmail.com
> >wrote:
>
> > Dears,
> >
> > I've just finished some slides about some cases of designing hbase
> > table schemas. Please have a look here:
> >
> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
> >
> > The cases are mostly collected from websites. The reason I did this is
> > because I found there're no good guides for me to say how to design
> > hbase table schemas.
> >
> > Any suggestions or best practices that you wanna share are welcomed!
> >
> > Thanks a lot!
> >
> > sincerely,
> > Evan
> >
>
Re: slides about some case studies of hbase table schema design
Posted by "Qingyan(Evan) Liu" <qi...@gmail.com>.
Hi stack,
I've added it to http://wiki.apache.org/hadoop/HBase/HBasePresentations
thanks.
sincerely,
Evan
2009/7/14 stack <st...@duboce.net>:
> Please add a link to the below either to presentations or articles up on the
> hbase wiki.
>
> Thanks for the excellent contribution filling a hole we have had in our
> documentation with a while now.
>
> Yours,
> St.Ack
>
> On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qi...@gmail.com>wrote:
>
>> Dears,
>>
>> I've just finished some slides about some cases of designing hbase
>> table schemas. Please have a look here:
>> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>>
>> The cases are mostly collected from websites. The reason I did this is
>> because I found there're no good guides for me to say how to design
>> hbase table schemas.
>>
>> Any suggestions or best practices that you wanna share are welcomed!
>>
>> Thanks a lot!
>>
>> sincerely,
>> Evan
>>
>
Re: slides about some case studies of hbase table schema design
Posted by stack <st...@duboce.net>.
Please add a link to the below either to presentations or articles up on the
hbase wiki.
Thanks for the excellent contribution filling a hole we have had in our
documentation with a while now.
Yours,
St.Ack
On Mon, Jul 13, 2009 at 11:08 AM, Qingyan(Evan) Liu <qi...@gmail.com>wrote:
> Dears,
>
> I've just finished some slides about some cases of designing hbase
> table schemas. Please have a look here:
> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
>
> The cases are mostly collected from websites. The reason I did this is
> because I found there're no good guides for me to say how to design
> hbase table schemas.
>
> Any suggestions or best practices that you wanna share are welcomed!
>
> Thanks a lot!
>
> sincerely,
> Evan
>