You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Kang-Sen Lu <kl...@anovadata.com> on 2019/01/18 14:13:25 UTC

question about how to configure TOPN aggregation function

I am running kylin 2.5.1. I have a question about topn aggregation function usage. Because I did not find document about how to configure TOPN aggregation function, so I am not sure if the problem I am facing is expected or a bug.


Here is my test case:


one data model, and one cube configured.
in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
No SUM(HITS) was configured in the cube.
Built one hour of cube data.
Issued the following query:
select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;

The query had "null" Exception.

2019-01-18 08:58:28,740 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 : CUBE[name=ma_aggs_cube_5] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
2019-01-18 08:58:28,767 ERROR [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 : Exception while executing query
java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
        at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
        at org.apache.calcite.avatica.Helper.createException(Helper.java:41)

My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is configured in a cube, is it necessary to also configure the "SUM(HITS)" in the cube?


Kang-sen



Re: question about how to configure TOPN aggregation function

Posted by JiaTao Tao <ta...@gmail.com>.
Hi, Kang sen,

Never mind.

And for the "user guide" kind of document", I believe that the document
link given by Shaofeng is for the end user. And the blog I mentioned is an
extended reading that helps you better understanding Kylin's TOP_N. An end
user should read documents first finding how to use it and then go extended
reading.



Kang-Sen Lu <kl...@anovadata.com> 于2019年1月19日周六 下午12:59写道:

> Hi, Jia Tao:
>
>
> I am sorry I typed your name incorrectly by accident. I was composing the
> email on amazon kindle's silk browser. It has a "feature" to automatically
> correct any typo. So, when I typed "Jia Tao", the software thinks I had a
> typo, and automatically changed it to "His Tao". I should have
> proof-reading before I hit send button. As most of the engineers, I did not
> have patience to proof-reading.
>
>
> Kang-sen
>
>
> ------------------------------
> *From:* Kang-Sen Lu <kl...@anovadata.com>
> *Sent:* Saturday, January 19, 2019 2:14:10 AM
> *To:* user@kylin.apache.org
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> Hi, His Tao:
>
> Thanks for your reply.
>
> I had read the article you mentioned sometime ago. It is a very
> informative article about the technical aspects of top n measure feature.
> It describes the purpose and principle of top n aggregation.
> However, it does not talk about how the end user will provide top n
> configuration. So user must guess how to use nylon-fui to configure cube
> when he/she wants to take advantage of this good feature.
> Unless you are thinking from user's petspective, you won't find it is
> useful to create a "user guide" kind of document.
> Kang-sen
> ------------------------------
> *From:* JiaTao Tao <ta...@gmail.com>
> *Sent:* Friday, January 18, 2019 10:04:51 PM
> *To:* user@kylin.apache.org
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> Hi Kang-sen
> There's a blog written by Shaofeng and it's good for understanding TOPN_N
> in Kylin, may it helps:
> http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
> Apache Kylin | Approximate Top-N support in Kylin
> <http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/>
> kylin.apache.org
> Approximate Top-N support in Kylin. Mar 19, 2016 • Shaofeng Shi.
> Background. Find the Top-N (or Top-K) entities from a dataset is a common
> scenario and requirement in data minding; We often see the reports or news
> like “Top 100 companies in the world”, “Most popular 20 electronics” sold
> on a big e-commerce platform, etc. Exploring and analysising the top
> entities can always find some ...
>
> And for the typical scenario, we shouldn't define the TOP_N's group by col
> in Kyiln's dimensions, or it will no benefits for TOP N query.
>
> Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午2:48写道:
>
> Through several experiments, I have learned that (1) when a TOPN(SUM(x),
> group-by y) is configured, currently I must configure SUM(x) as well. But
> there is a bug addressing this problem (KYLIN-3322). In the aggregation
> froup configuration, the dim y does NOT have to be configured in any agg.
>
> (2) To prevent kylin from routing a non-topn query to the topn cube, bug
> KYLIN-2620 is touching the bases with this issue. I am hoping the bug fix
> for KYLIN-2620 will be able to accurately dis-qualify a non-topn query from
> being routed to a topn cube, by recognizing that the query does not include
> ORDER-BY and LIMIT info.
>
>
> Currently, I have hacked the topn cost multiplier function to get by this
> issue. Because I am not fully aware of how KYLIN works, the hack may cause
> problem in an unexpected way. So when a true solution is available, I will
> switch over.
>
>
> Thanks.
>
>
> Kang-sen
> ------------------------------
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, January 18, 2019 9:38:42 AM
> *To:* user
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> Hi Kang sen,
>
> It is in the cube tutorial, please search "TOP_N" in it:
>
> https://kylin.apache.org/docs/tutorial/create_cube.html
>
> We will investigate this bug in next week; If you have any findings,
> please share with us; also, a patch or PR is welcomed!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng.shi@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:31写道:
>
> Hi, ShaoFeng:
>
>
> Thanks for the info. So what I found is a bug in kylin.
>
>
> I am curious if there are any tutorial about hwo to use KYLIN GUI to
> configure the TOPN measure, i.e what is the minimum info being configured
> to make it work?
>
>
> I can see in sample project json files how kylin expects the cube
> configuration. But how does a user using KYLIN GUI to accomplish the same
> effect is not clear.
>
>
> Kang-sen
> ------------------------------
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, January 18, 2019 9:16:47 AM
> *To:* user
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> In theory, it doesn't need a separate SUM() measure;
>
> Your issue seems to be the same as:
>
> https://issues.apache.org/jira/browse/KYLIN-3322
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng.shi@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:13写道:
>
> I am running kylin 2.5.1. I have a question about topn aggregation
> function usage. Because I did not find document about how to configure TOPN
> aggregation function, so I am not sure if the problem I am facing is
> expected or a bug.
>
>
> Here is my test case:
>
>
> one data model, and one cube configured.
> in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
> No SUM(HITS) was configured in the cube.
> Built one hour of cube data.
> Issued the following query:
> select SUBSCRIBER_ID, sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;
>
> The query had "null" Exception.
>
> 2019-01-18 08:58:28,740 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class
> org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 :
> CUBE[name=ma_aggs_cube_5] priority 1 cost 279.
> CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class org.apache.kylin.query.routing.rules.RealizationSortRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The
> realizations remaining:
> [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final
> chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
> 2019-01-18 08:58:28,767 ERROR [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 :
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID,
> sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>
> My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is
> configured in a cube, is it necessary to also configure the "SUM(HITS)" in
> the cube?
>
>
> Kang-sen
>
>
>
>
>
> --
>
>
> Regards!
>
> Aron Tao
>


-- 


Regards!

Aron Tao

Re: question about how to configure TOPN aggregation function

Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, Jia Tao:


I am sorry I typed your name incorrectly by accident. I was composing the email on amazon kindle's silk browser. It has a "feature" to automatically correct any typo. So, when I typed "Jia Tao", the software thinks I had a typo, and automatically changed it to "His Tao". I should have proof-reading before I hit send button. As most of the engineers, I did not have patience to proof-reading.


Kang-sen


________________________________
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Saturday, January 19, 2019 2:14:10 AM
To: user@kylin.apache.org
Subject: Re: question about how to configure TOPN aggregation function

Hi, His Tao:

Thanks for your reply.

I had read the article you mentioned sometime ago. It is a very informative article about the technical aspects of top n measure feature. It describes the purpose and principle of top n aggregation.
However, it does not talk about how the end user will provide top n configuration. So user must guess how to use nylon-fui to configure cube when he/she wants to take advantage of this good feature.
Unless you are thinking from user's petspective, you won't find it is useful to create a "user guide" kind of document.
Kang-sen
________________________________
From: JiaTao Tao <ta...@gmail.com>
Sent: Friday, January 18, 2019 10:04:51 PM
To: user@kylin.apache.org
Subject: Re: question about how to configure TOPN aggregation function

Hi Kang-sen
There's a blog written by Shaofeng and it's good for understanding TOPN_N in Kylin, may it helps:
http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
Apache Kylin | Approximate Top-N support in Kylin<http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/>
kylin.apache.org
Approximate Top-N support in Kylin. Mar 19, 2016 • Shaofeng Shi. Background. Find the Top-N (or Top-K) entities from a dataset is a common scenario and requirement in data minding; We often see the reports or news like “Top 100 companies in the world”, “Most popular 20 electronics” sold on a big e-commerce platform, etc. Exploring and analysising the top entities can always find some ...


And for the typical scenario, we shouldn't define the TOP_N's group by col in Kyiln's dimensions, or it will no benefits for TOP N query.

Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午2:48写道:

Through several experiments, I have learned that (1) when a TOPN(SUM(x), group-by y) is configured, currently I must configure SUM(x) as well. But there is a bug addressing this problem (KYLIN-3322). In the aggregation froup configuration, the dim y does NOT have to be configured in any agg.

(2) To prevent kylin from routing a non-topn query to the topn cube, bug KYLIN-2620 is touching the bases with this issue. I am hoping the bug fix for KYLIN-2620 will be able to accurately dis-qualify a non-topn query from being routed to a topn cube, by recognizing that the query does not include ORDER-BY and LIMIT info.


Currently, I have hacked the topn cost multiplier function to get by this issue. Because I am not fully aware of how KYLIN works, the hack may cause problem in an unexpected way. So when a true solution is available, I will switch over.


Thanks.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, January 18, 2019 9:38:42 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

Hi Kang sen,

It is in the cube tutorial, please search "TOP_N" in it:

https://kylin.apache.org/docs/tutorial/create_cube.html

We will investigate this bug in next week; If you have any findings, please share with us; also, a patch or PR is welcomed!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:31写道:

Hi, ShaoFeng:


Thanks for the info. So what I found is a bug in kylin.


I am curious if there are any tutorial about hwo to use KYLIN GUI to configure the TOPN measure, i.e what is the minimum info being configured to make it work?


I can see in sample project json files how kylin expects the cube configuration. But how does a user using KYLIN GUI to accomplish the same effect is not clear.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, January 18, 2019 9:16:47 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

In theory, it doesn't need a separate SUM() measure;

Your issue seems to be the same as:

https://issues.apache.org/jira/browse/KYLIN-3322

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:13写道:

I am running kylin 2.5.1. I have a question about topn aggregation function usage. Because I did not find document about how to configure TOPN aggregation function, so I am not sure if the problem I am facing is expected or a bug.


Here is my test case:


one data model, and one cube configured.
in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
No SUM(HITS) was configured in the cube.
Built one hour of cube data.
Issued the following query:
select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;

The query had "null" Exception.

2019-01-18 08:58:28,740 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 : CUBE[name=ma_aggs_cube_5] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
2019-01-18 08:58:28,767 ERROR [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 : Exception while executing query
java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
        at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
        at org.apache.calcite.avatica.Helper.createException(Helper.java:41)

My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is configured in a cube, is it necessary to also configure the "SUM(HITS)" in the cube?


Kang-sen




--


Regards!

Aron Tao

Re: question about how to configure TOPN aggregation function

Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, His Tao:

Thanks for your reply.

I had read the article you mentioned sometime ago. It is a very informative article about the technical aspects of top n measure feature. It describes the purpose and principle of top n aggregation.
However, it does not talk about how the end user will provide top n configuration. So user must guess how to use nylon-fui to configure cube when he/she wants to take advantage of this good feature.
Unless you are thinking from user's petspective, you won't find it is useful to create a "user guide" kind of document.
Kang-sen
________________________________
From: JiaTao Tao <ta...@gmail.com>
Sent: Friday, January 18, 2019 10:04:51 PM
To: user@kylin.apache.org
Subject: Re: question about how to configure TOPN aggregation function

Hi Kang-sen
There's a blog written by Shaofeng and it's good for understanding TOPN_N in Kylin, may it helps:
http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/

And for the typical scenario, we shouldn't define the TOP_N's group by col in Kyiln's dimensions, or it will no benefits for TOP N query.

Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午2:48写道:

Through several experiments, I have learned that (1) when a TOPN(SUM(x), group-by y) is configured, currently I must configure SUM(x) as well. But there is a bug addressing this problem (KYLIN-3322). In the aggregation froup configuration, the dim y does NOT have to be configured in any agg.

(2) To prevent kylin from routing a non-topn query to the topn cube, bug KYLIN-2620 is touching the bases with this issue. I am hoping the bug fix for KYLIN-2620 will be able to accurately dis-qualify a non-topn query from being routed to a topn cube, by recognizing that the query does not include ORDER-BY and LIMIT info.


Currently, I have hacked the topn cost multiplier function to get by this issue. Because I am not fully aware of how KYLIN works, the hack may cause problem in an unexpected way. So when a true solution is available, I will switch over.


Thanks.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, January 18, 2019 9:38:42 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

Hi Kang sen,

It is in the cube tutorial, please search "TOP_N" in it:

https://kylin.apache.org/docs/tutorial/create_cube.html

We will investigate this bug in next week; If you have any findings, please share with us; also, a patch or PR is welcomed!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:31写道:

Hi, ShaoFeng:


Thanks for the info. So what I found is a bug in kylin.


I am curious if there are any tutorial about hwo to use KYLIN GUI to configure the TOPN measure, i.e what is the minimum info being configured to make it work?


I can see in sample project json files how kylin expects the cube configuration. But how does a user using KYLIN GUI to accomplish the same effect is not clear.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, January 18, 2019 9:16:47 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

In theory, it doesn't need a separate SUM() measure;

Your issue seems to be the same as:

https://issues.apache.org/jira/browse/KYLIN-3322

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:13写道:

I am running kylin 2.5.1. I have a question about topn aggregation function usage. Because I did not find document about how to configure TOPN aggregation function, so I am not sure if the problem I am facing is expected or a bug.


Here is my test case:


one data model, and one cube configured.
in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
No SUM(HITS) was configured in the cube.
Built one hour of cube data.
Issued the following query:
select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;

The query had "null" Exception.

2019-01-18 08:58:28,740 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 : CUBE[name=ma_aggs_cube_5] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
2019-01-18 08:58:28,767 ERROR [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 : Exception while executing query
java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
        at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
        at org.apache.calcite.avatica.Helper.createException(Helper.java:41)

My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is configured in a cube, is it necessary to also configure the "SUM(HITS)" in the cube?


Kang-sen




--


Regards!

Aron Tao

Re: question about how to configure TOPN aggregation function

Posted by JiaTao Tao <ta...@gmail.com>.
Hi Kang-sen
There's a blog written by Shaofeng and it's good for understanding TOPN_N
in Kylin, may it helps:
http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/

And for the typical scenario, we shouldn't define the TOP_N's group by col
in Kyiln's dimensions, or it will no benefits for TOP N query.

Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午2:48写道:

> Through several experiments, I have learned that (1) when a TOPN(SUM(x),
> group-by y) is configured, currently I must configure SUM(x) as well. But
> there is a bug addressing this problem (KYLIN-3322). In the aggregation
> froup configuration, the dim y does NOT have to be configured in any agg.
>
> (2) To prevent kylin from routing a non-topn query to the topn cube, bug
> KYLIN-2620 is touching the bases with this issue. I am hoping the bug fix
> for KYLIN-2620 will be able to accurately dis-qualify a non-topn query from
> being routed to a topn cube, by recognizing that the query does not include
> ORDER-BY and LIMIT info.
>
>
> Currently, I have hacked the topn cost multiplier function to get by this
> issue. Because I am not fully aware of how KYLIN works, the hack may cause
> problem in an unexpected way. So when a true solution is available, I will
> switch over.
>
>
> Thanks.
>
>
> Kang-sen
> ------------------------------
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, January 18, 2019 9:38:42 AM
> *To:* user
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> Hi Kang sen,
>
> It is in the cube tutorial, please search "TOP_N" in it:
>
> https://kylin.apache.org/docs/tutorial/create_cube.html
>
> We will investigate this bug in next week; If you have any findings,
> please share with us; also, a patch or PR is welcomed!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng.shi@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:31写道:
>
> Hi, ShaoFeng:
>
>
> Thanks for the info. So what I found is a bug in kylin.
>
>
> I am curious if there are any tutorial about hwo to use KYLIN GUI to
> configure the TOPN measure, i.e what is the minimum info being configured
> to make it work?
>
>
> I can see in sample project json files how kylin expects the cube
> configuration. But how does a user using KYLIN GUI to accomplish the same
> effect is not clear.
>
>
> Kang-sen
> ------------------------------
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, January 18, 2019 9:16:47 AM
> *To:* user
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> In theory, it doesn't need a separate SUM() measure;
>
> Your issue seems to be the same as:
>
> https://issues.apache.org/jira/browse/KYLIN-3322
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng.shi@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:13写道:
>
> I am running kylin 2.5.1. I have a question about topn aggregation
> function usage. Because I did not find document about how to configure TOPN
> aggregation function, so I am not sure if the problem I am facing is
> expected or a bug.
>
>
> Here is my test case:
>
>
> one data model, and one cube configured.
> in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
> No SUM(HITS) was configured in the cube.
> Built one hour of cube data.
> Issued the following query:
> select SUBSCRIBER_ID, sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;
>
> The query had "null" Exception.
>
> 2019-01-18 08:58:28,740 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class
> org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 :
> CUBE[name=ma_aggs_cube_5] priority 1 cost 279.
> CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class org.apache.kylin.query.routing.rules.RealizationSortRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The
> realizations remaining:
> [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final
> chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
> 2019-01-18 08:58:28,767 ERROR [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 :
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID,
> sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>
> My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is
> configured in a cube, is it necessary to also configure the "SUM(HITS)" in
> the cube?
>
>
> Kang-sen
>
>
>
>

-- 


Regards!

Aron Tao

Re: question about how to configure TOPN aggregation function

Posted by Kang-Sen Lu <kl...@anovadata.com>.
Through several experiments, I have learned that (1) when a TOPN(SUM(x), group-by y) is configured, currently I must configure SUM(x) as well. But there is a bug addressing this problem (KYLIN-3322). In the aggregation froup configuration, the dim y does NOT have to be configured in any agg.

(2) To prevent kylin from routing a non-topn query to the topn cube, bug KYLIN-2620 is touching the bases with this issue. I am hoping the bug fix for KYLIN-2620 will be able to accurately dis-qualify a non-topn query from being routed to a topn cube, by recognizing that the query does not include ORDER-BY and LIMIT info.


Currently, I have hacked the topn cost multiplier function to get by this issue. Because I am not fully aware of how KYLIN works, the hack may cause problem in an unexpected way. So when a true solution is available, I will switch over.


Thanks.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>
Sent: Friday, January 18, 2019 9:38:42 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

Hi Kang sen,

It is in the cube tutorial, please search "TOP_N" in it:

https://kylin.apache.org/docs/tutorial/create_cube.html

We will investigate this bug in next week; If you have any findings, please share with us; also, a patch or PR is welcomed!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:31写道:

Hi, ShaoFeng:


Thanks for the info. So what I found is a bug in kylin.


I am curious if there are any tutorial about hwo to use KYLIN GUI to configure the TOPN measure, i.e what is the minimum info being configured to make it work?


I can see in sample project json files how kylin expects the cube configuration. But how does a user using KYLIN GUI to accomplish the same effect is not clear.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, January 18, 2019 9:16:47 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

In theory, it doesn't need a separate SUM() measure;

Your issue seems to be the same as:

https://issues.apache.org/jira/browse/KYLIN-3322

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:13写道:

I am running kylin 2.5.1. I have a question about topn aggregation function usage. Because I did not find document about how to configure TOPN aggregation function, so I am not sure if the problem I am facing is expected or a bug.


Here is my test case:


one data model, and one cube configured.
in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
No SUM(HITS) was configured in the cube.
Built one hour of cube data.
Issued the following query:
select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;

The query had "null" Exception.

2019-01-18 08:58:28,740 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 : CUBE[name=ma_aggs_cube_5] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
2019-01-18 08:58:28,767 ERROR [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 : Exception while executing query
java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
        at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
        at org.apache.calcite.avatica.Helper.createException(Helper.java:41)

My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is configured in a cube, is it necessary to also configure the "SUM(HITS)" in the cube?


Kang-sen



Re: question about how to configure TOPN aggregation function

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Kang sen,

It is in the cube tutorial, please search "TOP_N" in it:

https://kylin.apache.org/docs/tutorial/create_cube.html

We will investigate this bug in next week; If you have any findings, please
share with us; also, a patch or PR is welcomed!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:31写道:

> Hi, ShaoFeng:
>
>
> Thanks for the info. So what I found is a bug in kylin.
>
>
> I am curious if there are any tutorial about hwo to use KYLIN GUI to
> configure the TOPN measure, i.e what is the minimum info being configured
> to make it work?
>
>
> I can see in sample project json files how kylin expects the cube
> configuration. But how does a user using KYLIN GUI to accomplish the same
> effect is not clear.
>
>
> Kang-sen
> ------------------------------
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, January 18, 2019 9:16:47 AM
> *To:* user
> *Subject:* Re: question about how to configure TOPN aggregation function
>
> In theory, it doesn't need a separate SUM() measure;
>
> Your issue seems to be the same as:
>
> https://issues.apache.org/jira/browse/KYLIN-3322
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng.shi@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:13写道:
>
> I am running kylin 2.5.1. I have a question about topn aggregation
> function usage. Because I did not find document about how to configure TOPN
> aggregation function, so I am not sure if the problem I am facing is
> expected or a bug.
>
>
> Here is my test case:
>
>
> one data model, and one cube configured.
> in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
> No SUM(HITS) was configured in the cube.
> Built one hour of cube data.
> Issued the following query:
> select SUBSCRIBER_ID, sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;
>
> The query had "null" Exception.
>
> 2019-01-18 08:58:28,740 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class
> org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 :
> CUBE[name=ma_aggs_cube_5] priority 1 cost 279.
> CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class org.apache.kylin.query.routing.rules.RealizationSortRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The
> realizations remaining:
> [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final
> chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
> 2019-01-18 08:58:28,767 ERROR [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 :
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID,
> sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>
> My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is
> configured in a cube, is it necessary to also configure the "SUM(HITS)" in
> the cube?
>
>
> Kang-sen
>
>
>
>

Re: question about how to configure TOPN aggregation function

Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, ShaoFeng:


Thanks for the info. So what I found is a bug in kylin.


I am curious if there are any tutorial about hwo to use KYLIN GUI to configure the TOPN measure, i.e what is the minimum info being configured to make it work?


I can see in sample project json files how kylin expects the cube configuration. But how does a user using KYLIN GUI to accomplish the same effect is not clear.


Kang-sen

________________________________
From: ShaoFeng Shi <sh...@apache.org>
Sent: Friday, January 18, 2019 9:16:47 AM
To: user
Subject: Re: question about how to configure TOPN aggregation function

In theory, it doesn't need a separate SUM() measure;

Your issue seems to be the same as:

https://issues.apache.org/jira/browse/KYLIN-3322

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2019年1月18日周五 下午10:13写道:

I am running kylin 2.5.1. I have a question about topn aggregation function usage. Because I did not find document about how to configure TOPN aggregation function, so I am not sure if the problem I am facing is expected or a bug.


Here is my test case:


one data model, and one cube configured.
in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
No SUM(HITS) was configured in the cube.
Built one hour of cube data.
Issued the following query:
select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;

The query had "null" Exception.

2019-01-18 08:58:28,740 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 : CUBE[name=ma_aggs_cube_5] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
2019-01-18 08:58:28,741 INFO  [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
2019-01-18 08:58:28,767 ERROR [Query d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 : Exception while executing query
java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID, sum(hits)
from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
        at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
        at org.apache.calcite.avatica.Helper.createException(Helper.java:41)

My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is configured in a cube, is it necessary to also configure the "SUM(HITS)" in the cube?


Kang-sen



Re: question about how to configure TOPN aggregation function

Posted by ShaoFeng Shi <sh...@apache.org>.
In theory, it doesn't need a separate SUM() measure;

Your issue seems to be the same as:

https://issues.apache.org/jira/browse/KYLIN-3322

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Kang-Sen Lu <kl...@anovadata.com> 于2019年1月18日周五 下午10:13写道:

> I am running kylin 2.5.1. I have a question about topn aggregation
> function usage. Because I did not find document about how to configure TOPN
> aggregation function, so I am not sure if the problem I am facing is
> expected or a bug.
>
>
> Here is my test case:
>
>
> one data model, and one cube configured.
> in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured.
> No SUM(HITS) was configured in the cube.
> Built one hour of cube data.
> Issued the following query:
> select SUBSCRIBER_ID, sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ;
>
> The query had "null" Exception.
>
> 2019-01-18 08:58:28,740 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class
> org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 :
> CUBE[name=ma_aggs_cube_5] priority 1 cost 279.
> CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27.
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying
> rule: class org.apache.kylin.query.routing.rules.RealizationSortRule,
> realizations before:
> [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations
> after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]]
> 2019-01-18 08:58:28,741 INFO  [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The
> realizations remaining:
> [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final
> chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test]
> 2019-01-18 08:58:28,767 ERROR [Query
> d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 :
> Exception while executing query
> java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID,
> sum(hits)
> from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY
> SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at
> org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>
> My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is
> configured in a cube, is it necessary to also configure the "SUM(HITS)" in
> the cube?
>
>
> Kang-sen
>
>
>
>