You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Yves Langisch <yv...@langisch.ch> on 2011/04/16 10:27:51 UTC

Schema design question

As I'm about to plan a similar app I have studied the HBase schema of the opentsb project:

http://opentsdb.net/schema.html

The opentsb approach seems to have many rows instead of many columns. What is the better schema design in terms of query performance? My experience so far is that a width schema with many columns but less rows performs better. A 'horizontal' table scan seems to be better suited for fast queries. 

Yves

Re: Schema design question

Posted by Ted Dunning <td...@maprtech.com>.

I think that your mileage will definitely vary on this point.  Your
design may work very well.  Or not.  I would worry just a bit if your
data points are large enough to create a really massive row (greater
than about a megabyte).

On Sun, Apr 17, 2011 at 11:48 PM, Yves Langisch <yv...@langisch.ch> wrote:
> So I wonder if the query performance could be improved with periods of 60 minutes leading to 3600 columns max assuming that all columns are needed and no filtering is done? Basically the question is if it's better to have a wide design (horizontal) rather than a vertical one (many rows) for such a scenario?

Re: Schema design question

Posted by Yves Langisch <yv...@langisch.ch>.

Yes, you're right. They have a row for each 10 minute period. Inside a row they work with offsets in seconds within this 10 minute period. This leads to a maximum of 10*60 columns per row. Normally you have less columns as you don't have a datapoint for each second.

So I wonder if the query performance could be improved with periods of 60 minutes leading to 3600 columns max assuming that all columns are needed and no filtering is done? Basically the question is if it's better to have a wide design (horizontal) rather than a vertical one (many rows) for such a scenario?

On Apr 16, 2011, at 11:51 PM, Ted Dunning wrote:

> TsDB has more columns than it appears at first glance.  They store all of the observations for a relatively long time interval in a single row.
> 
> You may have spotted that right off (I didn't).
> 
> On Sat, Apr 16, 2011 at 1:27 AM, Yves Langisch <yv...@langisch.ch> wrote:
> As I'm about to plan a similar app I have studied the HBase schema of the opentsb project:
> 
> http://opentsdb.net/schema.html
> 
> The opentsb approach seems to have many rows instead of many columns. What is the better schema design in terms of query performance? My experience so far is that a width schema with many columns but less rows performs better. A 'horizontal' table scan seems to be better suited for fast queries.
> 
> Yves
>

Re: Schema design question

Posted by Ted Dunning <td...@maprtech.com>.

TsDB has more columns than it appears at first glance.  They store all of
the observations for a relatively long time interval in a single row.

You may have spotted that right off (I didn't).

On Sat, Apr 16, 2011 at 1:27 AM, Yves Langisch <yv...@langisch.ch> wrote:

> As I'm about to plan a similar app I have studied the HBase schema of the
> opentsb project:
>
> http://opentsdb.net/schema.html
>
> The opentsb approach seems to have many rows instead of many columns. What
> is the better schema design in terms of query performance? My experience so
> far is that a width schema with many columns but less rows performs better.
> A 'horizontal' table scan seems to be better suited for fast queries.
>
> Yves