You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Li Yang <li...@apache.org> on 2015/06/04 07:55:08 UTC
Re: How to setermine the exact number of cuboids ( dimension combination) that will be generated

Don't be scared by the number of rows in hbase. They are highly compressed
in storage.  You can check the actual size of the hbase tables from hbase
console to get a correct feeling of the actual storage. Also Kylin GUI
shows the original table, cube size, and the inflation rate.

Small/Medium/Large is a mean to control hbase region size, does not related
to how many cuboids to aggregate. What matter is the the number of
dimensions and their type. "Derived" does not store in the cube (they get
looked up at query time), thus is most lightweight. "Hierarchy" generate
less cuboids than normal dimension.

Cheers
Yang

On Wed, May 27, 2015 at 5:53 PM, Puneet Gupta <pu...@gmail.com>
wrote:

> Thanks for the reply Bin Mahone .
>
> I had seen the ppt before and based on it I tried creating my cube . But
> there is no way mentioned  to know actually how many aggregates get created
> for small size cube .  I am assuming for the same aggregate group different
> numbet of aggregates/cuboids get created based on cube size ( small/
> medium/ large)
>
> Also how can one have even more fine grained control on cuboids that are
> created ?
>
> _______
> sent from my phone
> On May 27, 2015 2:00 PM, "hongbin ma" <ma...@apache.org> wrote:
>
> > this link might be helpful:
> >
> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin?qid=145080e7-4abe-42c9-8048-f29ffec8a66c&v=default&b=&from_search=10
> >
> > On Wed, May 27, 2015 at 4:17 PM, Luke Han <lu...@gmail.com> wrote:
> >
> >> Forward to mailing list for further support.
> >>
> >> Thanks.
> >>
> >> 在 2015年5月27日星期三 UTC+8下午4:11:09，Puneet Gupta写道：
> >>>
> >>> Hi ,
> >>>
> >>> Is there any log message that i can look for to determine the exact
> >>> number of cuboids ( dimension combination)  that will be generated .
> >>>
> >>>
> >>> I have a small Fact table with 10,000 rows . The number of dimensions
> >>> are 11.
> >>> When I arrange the dimensions such that 2 are of type "column" and 9
> are
> >>> of type "derived" and I choose cube size ="small" in GUI , i get close
> to
> >>> 830,000 rows in HBase Aggregate/Cuboids table.
> >>>
> >>> When I arrange the dimensions such that 2 are of type "Hierarchy"
> >>> (Hierarchy1 has 4 levels Year,month,day,hour and Hirerachy2 has 2
> levels
> >>> Protocol Category and Protocol), 2 are of type "column" and rest are of
> >>> type "derived" and I choose cube size ="small" in GUI , i get close to
> >>> 2,000,000 rows in HBase Aggregate/Cuboids table.
> >>>
> >>> In both cases i selected Dictionary compression for row keys
> >>>
> >>> I wanted to control how many aggregates get generated.  I feel the size
> >>> of aggregate table is too high .
> >>>
> >>> Any suggestions ?
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>