You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by wangfx <94...@qq.com> on 2019/06/11 01:56:03 UTC

cube-rowkey排序咨询

cube若干个维度,其中a,b为强制维度,一定出现在where里,b的基数很低(只有3种数据);c,d不会出现在where里,只出现在select和group by里,基数c>d>a>b,剩下的维度是where里的常规维度,请问rowkey里abcd和其他的维度顺序怎么排?

Re: cube-rowkey排序咨询

Posted by JiaTao Tao <ta...@gmail.com>.
And this link(
https://www.slideshare.net/YangLi43/design-cube-in-apache-kylin) that
Shaofeng previous shared is also very helpful, see this chapter: "The Order
of Dimensions"

-- 


Regards!

Aron Tao



Xiaoxiang Yu <xi...@kyligence.io> 于2019年6月11日周二 上午2:45写道:

> Hi, wangfx
>
> Kylin converts sql query to two parameters(Start_key and end_key) in the
> range Scan operation in HBase.
> The well-designed Rowkey will more effectively complete the query
> filtering and positioning of the data, reduce the number of IO, improve the
> query speed, the order of the dimension in the Rowkey, and have a
> significant impact on the query performance.
>
> The following 2 principles need to be combined when adjusting the order of
> Rowkey: ·
> 1. Dimensions that are used as filter criteria in a query are placed in
> front of the non-filtered conditional dimension ·
> 2. Dimensions with a higher cardinality, before the lower cardinality
> dimension.
>
> So, in your situation, I suggest the order should be :a,b,c,d.(If you have
> only four dimensions).
>
> And this link may help,
> https://kyligence.io/zh/blog/apache-kylin-optimizer-kybot-rowkey/.
>
>
> ----------------
> Best wishes,
> Xiaoxiang Yu
>
>
> 在 2019/6/11 09:56,“wangfx”<94...@qq.com> 写入:
>
>
> cube若干个维度,其中a,b为强制维度,一定出现在where里,b的基数很低(只有3种数据);c,d不会出现在where里,只出现在select和group
> by里,基数c>d>a>b,剩下的维度是where里的常规维度,请问rowkey里abcd和其他的维度顺序怎么排?
>
>

Re: cube-rowkey排序咨询

Posted by Xiaoxiang Yu <xi...@kyligence.io>.
Hi, wangfx

Kylin converts sql query to two parameters(Start_key and end_key) in the range Scan operation in HBase.
The well-designed Rowkey will more effectively complete the query filtering and positioning of the data, reduce the number of IO, improve the query speed, the order of the dimension in the Rowkey, and have a significant impact on the query performance.

The following 2 principles need to be combined when adjusting the order of Rowkey: ·
1. Dimensions that are used as filter criteria in a query are placed in front of the non-filtered conditional dimension ·
2. Dimensions with a higher cardinality, before the lower cardinality dimension.

So, in your situation, I suggest the order should be :a,b,c,d.(If you have only four dimensions).   

And this link may help, https://kyligence.io/zh/blog/apache-kylin-optimizer-kybot-rowkey/.


----------------
Best wishes,
Xiaoxiang Yu 
 

在 2019/6/11 09:56,“wangfx”<94...@qq.com> 写入:

    cube若干个维度,其中a,b为强制维度,一定出现在where里,b的基数很低(只有3种数据);c,d不会出现在where里,只出现在select和group by里,基数c>d>a>b,剩下的维度是where里的常规维度,请问rowkey里abcd和其他的维度顺序怎么排?