You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by zhong zhang <zz...@gmail.com> on 2016/01/22 00:50:50 UTC

how does Kylin decide which cube to use for the SQL query?

Hi All,

After several cubes are built, we put a query in the UI.
How does Kylin decide which cube to use for this query?
My guess is that it is based on the join conditions in the
data model?

If we create two cubes with the exactly same data model
(same join conditions) but with different dimensions and
measures, how does Kylin know which cube to use for
a query?

Best regards,
Zhong

Re: Re: how does Kylin decide which cube to use for the SQL query?

Posted by hongbin ma <ma...@apache.org>.

why don't you cube segments instead using two separate cubes?
Cubes those aren't able to answer user query will not be selected

On Tue, Jan 26, 2016 at 12:27 AM, zhong zhang <zz...@gmail.com> wrote:

> 2014. When the query is about asking
> something from 2014 to 2015, does the selection algorithm select
> cubeA (2014 to 2015) correctly? What if the query is about asking something
> from 2013 to 2015? Should we
>

-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Re: how does Kylin decide which cube to use for the SQL query?

Posted by zhong zhang <zz...@gmail.com>.

Hi Hongbin,

Can you give a little bit detailed explanation for the cube selection
algorithm?
In a project, two cubes are created with the same data model. When I'd like
to make a query, how does Kylin select the correct cube?  Is there any
possible
that Kylin select a wrong cube (For example, the cube does not include the
dimension that the query actually uses)?

Assuming we have a scenario that the whole dataset is pretty big, we cannot
build the cube one time (from the start time to the end time of the
dataset).
So we decide to build the cube year by year, saying we built cubeA from 2014
to 2015 and cubeB from 2013 to 2014. When the query is about asking
something from 2014 to 2015, does the selection algorithm select
cubeA (2014 to 2015) correctly? What if the query is about asking something
from 2013 to 2015? Should we do a merge of cubeA and cubeB?

Is there a way to forcibly select a cube to use?

Best regards,
Zhong

On Fri, Jan 22, 2016 at 2:53 AM, hongbin ma <ma...@apache.org> wrote:

> i see, so cube selection should honor cubes with a better rowkey order
> respecting the current query.
>
> any other scenarios?
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Re: how does Kylin decide which cube to use for the SQL query?

Posted by hongbin ma <ma...@apache.org>.

i see, so cube selection should honor cubes with a better rowkey order
respecting the current query.

any other scenarios?

-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Re: how does Kylin decide which cube to use for the SQL query?

Posted by "13802880779@139.com" <13...@139.com>.

we have a case like this:
CubeA : date_id,hour_id,service_type,user,count1,count2....
the rowkey sequence is : date_id+hour_id+service_type+user
this is ok when i select all the users who use serviceA, but if we want to find all the services that userA used, it's became very slow; 
so we create another cubeB, erverthing is the same only the rowkey sequence changed:  
CubeB: date_id+hour_id+user+service_type

now the problem comes, if i put cubeB in the same project with cubeA, query scene in cubeA become very slow, so we have to build two project!

From: hongbin ma
Date: 2016-01-22 15:13
To: dev
Subject: Re: how does Kylin decide which cube to use for the SQL query?
this is somewhere kylin can improve.

i opened a ticket KYLIN-1358 - revisit on cube selection within same project
<https://issues.apache.org/jira/browse/KYLIN-1358> ,please comment what
you're expecting,and let's discuss to improve it.

On Fri, Jan 22, 2016 at 8:59 AM, 13802880779@139.com <13...@139.com>
wrote:

> kylin will evaluate the cost and select the best way, but in our case, the
> evaluation is far from perfect, so we have to create another project and
> cube;
>
>

>
> From: zhong zhang
> Date: 2016-01-22 07:50
> To: dev
> Subject: how does Kylin decide which cube to use for the SQL query?
> Hi All,
>
> After several cubes are built, we put a query in the UI.
> How does Kylin decide which cube to use for this query?
> My guess is that it is based on the join conditions in the
> data model?
>
> If we create two cubes with the exactly same data model
> (same join conditions) but with different dimensions and
> measures, how does Kylin know which cube to use for
> a query?
>
> Best regards,
> Zhong
>

-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: how does Kylin decide which cube to use for the SQL query?

Posted by hongbin ma <ma...@apache.org>.

this is somewhere kylin can improve.

i opened a ticket KYLIN-1358 - revisit on cube selection within same project
<https://issues.apache.org/jira/browse/KYLIN-1358> ,please comment what
you're expecting,and let's discuss to improve it.

On Fri, Jan 22, 2016 at 8:59 AM, 13802880779@139.com <13...@139.com>
wrote:

> kylin will evaluate the cost and select the best way, but in our case, the
> evaluation is far from perfect, so we have to create another project and
> cube;
>
>
>
> 中国移动广东有限公司 网管中心 梁猛
> 13802880779@139.com
>
> From: zhong zhang
> Date: 2016-01-22 07:50
> To: dev
> Subject: how does Kylin decide which cube to use for the SQL query?
> Hi All,
>
> After several cubes are built, we put a query in the UI.
> How does Kylin decide which cube to use for this query?
> My guess is that it is based on the join conditions in the
> data model?
>
> If we create two cubes with the exactly same data model
> (same join conditions) but with different dimensions and
> measures, how does Kylin know which cube to use for
> a query?
>
> Best regards,
> Zhong
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: how does Kylin decide which cube to use for the SQL query?

Posted by "13802880779@139.com" <13...@139.com>.

kylin will evaluate the cost and select the best way, but in our case, the evaluation is far from perfect, so we have to create another project and cube;

中国移动广东有限公司 网管中心 梁猛
13802880779@139.com

From: zhong zhang
Date: 2016-01-22 07:50
To: dev
Subject: how does Kylin decide which cube to use for the SQL query?
Hi All,

After several cubes are built, we put a query in the UI.
How does Kylin decide which cube to use for this query?
My guess is that it is based on the join conditions in the
data model?

If we create two cubes with the exactly same data model
(same join conditions) but with different dimensions and
measures, how does Kylin know which cube to use for
a query?

Best regards,
Zhong