You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by canucks <an...@gmail.com> on 2010/01/21 04:11:41 UTC

learning hbase - schema design advice

Hi,

i'm pretty interested in learning hbase.  what i want to do is store
financial data for analytical/graphing/displaying purposes.  there hundreds
of millions of rows and of course, i want fast response when retrieving the
data.

if i were to do it in a RDBMS it would be
REPORT,	MARKET,	OPERATING_DATE,	OPERATING_INTERVAL,	HOUR_ENDING	VALUE
where the bolded column name are PK.  if i were to store this in hbase would
it look like this?

REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
	VALUE: 92.29
}

so that i can do queries like below:
- give me all reports with the name of "ABC"
- give me all the values where OPERATING_DATE is from jan-01-2010 to
jan-10-2010
- give me all the values where OPERATING_DATE is from jan-01-2010 to
jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
thereof)

in short, is hbase the wrong way to go about it or would it yield better
performance?  also, you folks happen to know any good links/articles on
hbase table & schema?

thanks
-- 
View this message in context: http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: learning hbase - schema design advice

Posted by Edward Capriolo <ed...@gmail.com>.

On Thu, Jan 21, 2010 at 1:09 AM, Dan Washusen <da...@reactive.org> wrote:
> Have you read the bigtable paper linked off the front page of HBase?  It
> does a good job of explaining the concepts.  Basically it's a distributed
> sorted map (think java.util.NavigableMap but split over many machines).  If
> you know the key of the row you are looking for HBase can fetch it very
> quickly.  If you don't know the key you'll have to resort to scanning all
> the rows to find the data you are interested in (just like a SQL query that
> can't take advantage of an index)...
>
> Do the queries need to immediately reflect any writes or is it sufficient
> for them to become eventually consistent?  If you can live with eventual
> consistency then you could write some map reduce jobs that duplicate a
> master table into reporting tables (like you would for data
> warehousing/reporting on a RDMS).
>
> I'm sure some of the more experienced users will have more insight but that
> might get you started...
>
> Cheers,
> Dan
>
> p.s. bold text doesn't seem to come through the mailing list...
>
> 2010/1/21 canucks <an...@gmail.com>
>
>>
>> Hi,
>>
>> i'm pretty interested in learning hbase.  what i want to do is store
>> financial data for analytical/graphing/displaying purposes.  there hundreds
>> of millions of rows and of course, i want fast response when retrieving the
>> data.
>>
>> if i were to do it in a RDBMS it would be
>> REPORT, MARKET, OPERATING_DATE, OPERATING_INTERVAL,     HOUR_ENDING
>> VALUE
>> where the bolded column name are PK.  if i were to store this in hbase
>> would
>> it look like this?
>>
>> REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
>>        VALUE: 92.29
>> }
>>
>> so that i can do queries like below:
>> - give me all reports with the name of "ABC"
>> - give me all the values where OPERATING_DATE is from jan-01-2010 to
>> jan-10-2010
>> - give me all the values where OPERATING_DATE is from jan-01-2010 to
>> jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
>> thereof)
>>
>> in short, is hbase the wrong way to go about it or would it yield better
>> performance?  also, you folks happen to know any good links/articles on
>> hbase table & schema?
>>
>> thanks
>> --
>> View this message in context:
>> http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
>
I went looking for a paper "how to convert my RDBMS mindset to a
key-value store midset" Here is something that got me started.

http://s-expressions.com/2009/03/08/hbase-on-designing-schemas-for-column-oriented-data-stores/

Re: learning hbase - schema design advice

Posted by Dan Washusen <da...@reactive.org>.

Have you read the bigtable paper linked off the front page of HBase?  It
does a good job of explaining the concepts.  Basically it's a distributed
sorted map (think java.util.NavigableMap but split over many machines).  If
you know the key of the row you are looking for HBase can fetch it very
quickly.  If you don't know the key you'll have to resort to scanning all
the rows to find the data you are interested in (just like a SQL query that
can't take advantage of an index)...

Do the queries need to immediately reflect any writes or is it sufficient
for them to become eventually consistent?  If you can live with eventual
consistency then you could write some map reduce jobs that duplicate a
master table into reporting tables (like you would for data
warehousing/reporting on a RDMS).

I'm sure some of the more experienced users will have more insight but that
might get you started...

Cheers,
Dan

p.s. bold text doesn't seem to come through the mailing list...

2010/1/21 canucks <an...@gmail.com>

>
> Hi,
>
> i'm pretty interested in learning hbase.  what i want to do is store
> financial data for analytical/graphing/displaying purposes.  there hundreds
> of millions of rows and of course, i want fast response when retrieving the
> data.
>
> if i were to do it in a RDBMS it would be
> REPORT, MARKET, OPERATING_DATE, OPERATING_INTERVAL,     HOUR_ENDING
> VALUE
> where the bolded column name are PK.  if i were to store this in hbase
> would
> it look like this?
>
> REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
>        VALUE: 92.29
> }
>
> so that i can do queries like below:
> - give me all reports with the name of "ABC"
> - give me all the values where OPERATING_DATE is from jan-01-2010 to
> jan-10-2010
> - give me all the values where OPERATING_DATE is from jan-01-2010 to
> jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
> thereof)
>
> in short, is hbase the wrong way to go about it or would it yield better
> performance?  also, you folks happen to know any good links/articles on
> hbase table & schema?
>
> thanks
> --
> View this message in context:
> http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

RE: learning hbase - schema design advice

Posted by Aryeh Berkowitz <ar...@iswcorp.com>.

I found this to be very helpful:
http://data-tactics.com/techtips/cloud_data_structure_diagramming.pdf

-----Original Message-----
From: canucks [mailto:anhlon@gmail.com] 
Sent: Wednesday, January 20, 2010 10:12 PM
To: hbase-user@hadoop.apache.org
Subject: learning hbase - schema design advice

Hi,

i'm pretty interested in learning hbase.  what i want to do is store
financial data for analytical/graphing/displaying purposes.  there hundreds
of millions of rows and of course, i want fast response when retrieving the
data.

if i were to do it in a RDBMS it would be
REPORT,	MARKET,	OPERATING_DATE,	OPERATING_INTERVAL,	HOUR_ENDING	VALUE
where the bolded column name are PK.  if i were to store this in hbase would
it look like this?

REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
	VALUE: 92.29
}

so that i can do queries like below:
- give me all reports with the name of "ABC"
- give me all the values where OPERATING_DATE is from jan-01-2010 to
jan-10-2010
- give me all the values where OPERATING_DATE is from jan-01-2010 to
jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
thereof)

in short, is hbase the wrong way to go about it or would it yield better
performance?  also, you folks happen to know any good links/articles on
hbase table & schema?

thanks
-- 
View this message in context: http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: learning hbase - schema design advice

Posted by canucks <an...@gmail.com>.

thanks for all the tips.  i'll read through the links given and try it out.

-- 
View this message in context: http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27266457.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: learning hbase - schema design advice

Posted by stack <st...@duboce.net>.

Oh, and ... "No Relation: The Mixed Blessings of Non-Relational
Databases" [PDF]. Get at http://j.mp/2PjPB
St.Ack

On Thu, Jan 21, 2010 at 11:23 AM, stack <st...@duboce.net> wrote:
> On Wed, Jan 20, 2010 at 7:11 PM, canucks <an...@gmail.com> wrote:
>>.. also, you folks happen to know any good links/articles on
>> hbase table & schema?
>
> http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
> St.Ack
>

Re: learning hbase - schema design advice

Posted by stack <st...@duboce.net>.

On Wed, Jan 20, 2010 at 7:11 PM, canucks <an...@gmail.com> wrote:
>.. also, you folks happen to know any good links/articles on
> hbase table & schema?

http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies
St.Ack