You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by peter zhang <pe...@gmail.com> on 2016/12/09 03:44:31 UTC

Cube tunning

I build a cube. First time, without any tuning and no aggregation group
setting, cube size is about 20G.
Then refer tuning document, I add some aggregation group, cube is deduced
to 71.02 MB. Unfortunately, query performance is also worse than before,
most of the query latency is about more than 10 seconds.
I don't know what the different between add all columns in one new agg
group and split columns group into different group.  In my practice, I
created 6 agg-group.

Any guys can help check my json schema

Thanks in advance.

Re: Cube tunning

Posted by Billy Liu <bi...@apache.org>.

You are correct. Put dimensions which are used together into one aggr group
is right. Suppose you have dimension A, B and C. If A and B in aggr group
1, and B and C in group 2, then Kylin will pre-compute AB, BC, but without
AC. That means if you query by AB, the result will respond quickly, but if
query by AC, Kylin will post-aggregate the result by merging other cuboids,
that will slow down the query.

You could define multiple hierarchy in different aggr groups,  most time,
they have the same performance result as defining them into one aggr
groups. They are the rules telling Kylin how to combine the dimensions.

2016-12-09 13:33 GMT+08:00 peter zhang <pe...@gmail.com>:

> Billy, thanks very much for you quick reply.
>
> In my case, I logical split Aggregation-Groups by dimension category. As
> you can check out in my schema JSON, all the date related columns in a
> group, all the payment way related dimensions in another group and all the
> other junk  dimensions that are not used frequently in a group defined as
> joint group and so on( There are 6 groups in my setting)...
> As my understanding of your explanation,* is this more reasonable that
> put the dimensions that are often use in one query in a same group? For
> example, I often query payment way by day, then payment way dimension and
> date dimension should put in a same group.*
>
> Another big question, there can be multiple hierarchies / Joint Dimensions
> in one group. Why is there exists multiple aggregation groups? I another
> words, *we can define multi hierarchy dimensions in one group rather than
> create multi group.*
>
> 2016-12-09 12:32 GMT+08:00 Billy Liu <bi...@apache.org>:
>
>> Suppose you have N dimensions, and all in one agg group, then the total
>> cuboid will be 2^N.
>> But if you split N into N1, N2, N3, which N1+N2+N3>=N, then the total
>> cuboid will be 2^N1+2^N2+2^N3.
>> You will figure out how improvement this could be.
>>
>> How to split the agg groups depends on how your query would be. Maybe you
>> could share with us what kinds of query it is.
>>
>> 2016-12-09 11:44 GMT+08:00 peter zhang <pe...@gmail.com>:
>>
>>> I build a cube. First time, without any tuning and no aggregation group
>>> setting, cube size is about 20G.
>>> Then refer tuning document, I add some aggregation group, cube is
>>> deduced to 71.02 MB. Unfortunately, query performance is also worse
>>> than before, most of the query latency is about more than 10 seconds.
>>> I don't know what the different between add all columns in one new agg
>>> group and split columns group into different group.  In my practice, I
>>> created 6 agg-group.
>>>
>>> Any guys can help check my json schema
>>>
>>> Thanks in advance.
>>>
>>>
>>
>

Re: Cube tunning

Posted by peter zhang <pe...@gmail.com>.

Billy, thanks very much for you quick reply.

In my case, I logical split Aggregation-Groups by dimension category. As
you can check out in my schema JSON, all the date related columns in a
group, all the payment way related dimensions in another group and all the
other junk  dimensions that are not used frequently in a group defined as
joint group and so on( There are 6 groups in my setting)...
As my understanding of your explanation,* is this more reasonable that put
the dimensions that are often use in one query in a same group? For
example, I often query payment way by day, then payment way dimension and
date dimension should put in a same group.*

Another big question, there can be multiple hierarchies / Joint Dimensions
in one group. Why is there exists multiple aggregation groups? I another
words, *we can define multi hierarchy dimensions in one group rather than
create multi group.*

2016-12-09 12:32 GMT+08:00 Billy Liu <bi...@apache.org>:

> Suppose you have N dimensions, and all in one agg group, then the total
> cuboid will be 2^N.
> But if you split N into N1, N2, N3, which N1+N2+N3>=N, then the total
> cuboid will be 2^N1+2^N2+2^N3.
> You will figure out how improvement this could be.
>
> How to split the agg groups depends on how your query would be. Maybe you
> could share with us what kinds of query it is.
>
> 2016-12-09 11:44 GMT+08:00 peter zhang <pe...@gmail.com>:
>
>> I build a cube. First time, without any tuning and no aggregation group
>> setting, cube size is about 20G.
>> Then refer tuning document, I add some aggregation group, cube is deduced
>> to 71.02 MB. Unfortunately, query performance is also worse than before,
>> most of the query latency is about more than 10 seconds.
>> I don't know what the different between add all columns in one new agg
>> group and split columns group into different group.  In my practice, I
>> created 6 agg-group.
>>
>> Any guys can help check my json schema
>>
>> Thanks in advance.
>>
>>
>

Re: Cube tunning

Posted by Billy Liu <bi...@apache.org>.

Suppose you have N dimensions, and all in one agg group, then the total
cuboid will be 2^N.
But if you split N into N1, N2, N3, which N1+N2+N3>=N, then the total
cuboid will be 2^N1+2^N2+2^N3.
You will figure out how improvement this could be.

How to split the agg groups depends on how your query would be. Maybe you
could share with us what kinds of query it is.

2016-12-09 11:44 GMT+08:00 peter zhang <pe...@gmail.com>:

> I build a cube. First time, without any tuning and no aggregation group
> setting, cube size is about 20G.
> Then refer tuning document, I add some aggregation group, cube is deduced
> to 71.02 MB. Unfortunately, query performance is also worse than before,
> most of the query latency is about more than 10 seconds.
> I don't know what the different between add all columns in one new agg
> group and split columns group into different group.  In my practice, I
> created 6 agg-group.
>
> Any guys can help check my json schema
>
> Thanks in advance.
>
>