You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Billy(Yiming) Liu" <li...@gmail.com> on 2016/11/17 06:05:09 UTC

Re: some confuse about Mandatory Dimensions

If you set A, B, and C as mandatory dimensions, that means Kylin will save
the cuboid result by grouping A, B, C internally. But that not means you
could only query by grouping A, B, C.  If you only query A, B. The final
result will do post-aggregation by grouping the above cuboid. Same as query
grouping A. The cost is performance, since more post-aggregation needed.
But if you query by grouping D. There would be no result, since you missed
the mandatory dimension.

2016-11-17 13:31 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 <zhangxiaoming@qiyi.com
>:

> Hi,all
>
>          I have create a cube in My System with Mandatory Dimensions such
> as  A B C, and the Measure use count distinct filed “u” will HLL ,
>
> When the segment of the cube complete,I query the result with kylin sql
> as “select count(distinct u) from table where A=xxx and b=yyy” or “select
> count(distinct u) from table where A=xxx ”. The result is correct
>
> In my opinion, all of the query condition must be set (A=xxx,B=yyyy,C=zzz)
> ,the kylin sql can be wrok,
>
> The question is How the Kylin query the result and the distinct value is
> right ?  that is unbelievable
>



-- 
With Warm regards

Yiming Liu (刘一鸣)

Re: some confuse about Mandatory Dimensions

Posted by ShaoFeng Shi <sh...@apache.org>.
xiaoming, Kylin saves the HyperLogLog or Bitmap for the distinct count
measure (not just a number!), which means they are mergable for complex
query. So even you mark A+B+C as mandantory, when you query for a certain
sub combination like A, it will use those HLL or Bitmap to merge again to
return what you want.

2016-11-17 14:18 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 <zhangxiaoming@qiyi.com
>:

> Thanks Billy
>
> If Kylin save the result separate By A B C,The Times Can be understand,
> But “count distinct ” is  merge the same “u” ,Can’t do ++ Operation
> “”
>
>
>
> *From:* Billy(Yiming) Liu [mailto:liuyiming.vip@gmail.com]
> *Sent:* Thursday, November 17, 2016 2:05 PM
> *To:* user <us...@kylin.apache.org>
> *Cc:* dev@kylin.apache.org
> *Subject:* Re: some confuse about Mandatory Dimensions
>
>
>
> If you set A, B, and C as mandatory dimensions, that means Kylin will save
> the cuboid result by grouping A, B, C internally. But that not means you
> could only query by grouping A, B, C.  If you only query A, B. The final
> result will do post-aggregation by grouping the above cuboid. Same as query
> grouping A. The cost is performance, since more post-aggregation needed.
> But if you query by grouping D. There would be no result, since you missed
> the mandatory dimension.
>
>
>
> 2016-11-17 13:31 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 <
> zhangxiaoming@qiyi.com>:
>
> Hi,all
>
>          I have create a cube in My System with Mandatory Dimensions such
> as  A B C, and the Measure use count distinct filed “u” will HLL ,
>
> When the segment of the cube complete,I query the result with kylin sql
> as “select count(distinct u) from table where A=xxx and b=yyy” or “select
> count(distinct u) from table where A=xxx ”. The result is correct
>
> In my opinion, all of the query condition must be set (A=xxx,B=yyyy,C=zzz)
> ,the kylin sql can be wrok,
>
> The question is How the Kylin query the result and the distinct value is
> right ?  that is unbelievable
>
>
>
>
>
> --
>
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: some confuse about Mandatory Dimensions

Posted by ShaoFeng Shi <sh...@apache.org>.
xiaoming, Kylin saves the HyperLogLog or Bitmap for the distinct count
measure (not just a number!), which means they are mergable for complex
query. So even you mark A+B+C as mandantory, when you query for a certain
sub combination like A, it will use those HLL or Bitmap to merge again to
return what you want.

2016-11-17 14:18 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 <zhangxiaoming@qiyi.com
>:

> Thanks Billy
>
> If Kylin save the result separate By A B C,The Times Can be understand,
> But “count distinct ” is  merge the same “u” ,Can’t do ++ Operation
> “”
>
>
>
> *From:* Billy(Yiming) Liu [mailto:liuyiming.vip@gmail.com]
> *Sent:* Thursday, November 17, 2016 2:05 PM
> *To:* user <us...@kylin.apache.org>
> *Cc:* dev@kylin.apache.org
> *Subject:* Re: some confuse about Mandatory Dimensions
>
>
>
> If you set A, B, and C as mandatory dimensions, that means Kylin will save
> the cuboid result by grouping A, B, C internally. But that not means you
> could only query by grouping A, B, C.  If you only query A, B. The final
> result will do post-aggregation by grouping the above cuboid. Same as query
> grouping A. The cost is performance, since more post-aggregation needed.
> But if you query by grouping D. There would be no result, since you missed
> the mandatory dimension.
>
>
>
> 2016-11-17 13:31 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 <
> zhangxiaoming@qiyi.com>:
>
> Hi,all
>
>          I have create a cube in My System with Mandatory Dimensions such
> as  A B C, and the Measure use count distinct filed “u” will HLL ,
>
> When the segment of the cube complete,I query the result with kylin sql
> as “select count(distinct u) from table where A=xxx and b=yyy” or “select
> count(distinct u) from table where A=xxx ”. The result is correct
>
> In my opinion, all of the query condition must be set (A=xxx,B=yyyy,C=zzz)
> ,the kylin sql can be wrok,
>
> The question is How the Kylin query the result and the distinct value is
> right ?  that is unbelievable
>
>
>
>
>
> --
>
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Best regards,

Shaofeng Shi 史少锋

RE: some confuse about Mandatory Dimensions

Posted by 张晓明(zhangxiaoming)-技术产品中心 <zh...@qiyi.com>.
Thanks Billy
If Kylin save the result separate By A B C,The Times Can be understand, But “count distinct ” is  merge the same “u” ,Can’t do ++ Operation
“”

From: Billy(Yiming) Liu [mailto:liuyiming.vip@gmail.com]
Sent: Thursday, November 17, 2016 2:05 PM
To: user <us...@kylin.apache.org>
Cc: dev@kylin.apache.org
Subject: Re: some confuse about Mandatory Dimensions

If you set A, B, and C as mandatory dimensions, that means Kylin will save the cuboid result by grouping A, B, C internally. But that not means you could only query by grouping A, B, C.  If you only query A, B. The final result will do post-aggregation by grouping the above cuboid. Same as query grouping A. The cost is performance, since more post-aggregation needed. But if you query by grouping D. There would be no result, since you missed the mandatory dimension.

2016-11-17 13:31 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 <zh...@qiyi.com>>:
Hi,all
         I have create a cube in My System with Mandatory Dimensions such as  A B C, and the Measure use count distinct filed “u” will HLL ,
When the segment of the cube complete,I query the result with kylin sql  as “select count(distinct u) from table where A=xxx and b=yyy” or “select count(distinct u) from table where A=xxx ”. The result is correct
In my opinion, all of the query condition must be set (A=xxx,B=yyyy,C=zzz) ,the kylin sql can be wrok,
The question is How the Kylin query the result and the distinct value is right ?  that is unbelievable



--
With Warm regards

Yiming Liu (刘一鸣)