You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sean Bigdatafun <se...@gmail.com> on 2010/06/14 08:52:26 UTC

HBase and Star Schema

I am reading a blog related to HBase and its application in OLAP.
http://www.jroller.com/otis/entry/hbase_vs_rdbms_star_schema
In that blog, Jean-Daniel mentioned that "If you can afford to denormalize
your data by putting the dimension table data into the same table as the
fact table, then you can get very good read efficiency. For each dimension,
you would have a column family." Can someone give me more details about this
comment?


I understand Zohmg did some work in this area, but when I read the thesis
related to this project (
http://github.com/zohmg/zohmg/raw/master/doc/report/msc-report.pdf), it does
not seem to use the above approach that Jean-Daniel suggested (page 32 --
Storage/Data Model describes how Zohmg stores data). Actually, I am not sure
if Zohmg's approach can even scale for a super large dataset with lots of
dimensions -- the storage space will blow.



Can someone give me some detailed explanation of both of the above
approaches to achieve star schema implementation? Let's say we are trying to
model the following problem:

"(date, store_name, product_name, buyer_age) ---> (number of sale, total
number sold)"
In other words, we want to build an OLAP cube from the above 4 dimensions:
date, the name of store, the product name, the buyer's age (they point out
to the dimension tables in the Star Schema world)



Thanks,
Sean