You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Billy Liu <bi...@apache.org> on 2017/04/01 01:07:35 UTC
Re: Question regarding topN measure on string column
group by SUM, or group by COUNT is reasonable and supported. There is no
order by name alphabetical support.
2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
> hi,
>
> i believe it's not supported. besides, how do you define "order" on string?
> I don't think it's a reasonable requirement
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>
Re: Question regarding topN measure on string column
Posted by Shailesh Prajapati <sh...@infoworks.io>.
Thanks for the help. The description and query you provided makes sense.
On Sun, Apr 2, 2017 at 8:04 PM, ShaoFeng Shi <sh...@apache.org> wrote:
> The sample SQL missed product_name in group by, it should be:
>
> select city, product_name, sum(1) as occurancy from fact_table where city
> in ("abc") group by city, product_name order by occurancy desc limit 100;
>
> To get a better understanding of TopN, please check
> https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
>
> 2017-04-02 22:33 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>
> > Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
> > constant 1.
> >
> > The option b) can match your requirement in my understanding. You just
> > need define "product_name" as the "group by" column in TopN, and
> constant 1
> > as the "sum|order by" column; dont' forget to use "city" as cube's
> > dimension, then you can fetch the top products with SQL like:
> >
> > select city, product_name, sum(1) as occurancy from fact_table where city
> > in ("abc") group by city order by occurancy desc limit 100;
> >
> > If the "product_name" is a UHC column, you'd better use a non-dict
> > encoding (like "fixed_length") method for it.
> >
> >
> >
> > 2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
> >
> >> group by SUM, or group by COUNT is reasonable and supported. There is no
> >> order by name alphabetical support.
> >>
> >> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
> >>
> >> > hi,
> >> >
> >> > i believe it's not supported. besides, how do you define "order" on
> >> string?
> >> > I don't think it's a reasonable requirement
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > *Bin Mahone | 马洪宾*
> >> >
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
> >
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
--
Shailesh
Re: Question regarding topN measure on string column
Posted by ShaoFeng Shi <sh...@apache.org>.
The sample SQL missed product_name in group by, it should be:
select city, product_name, sum(1) as occurancy from fact_table where city
in ("abc") group by city, product_name order by occurancy desc limit 100;
To get a better understanding of TopN, please check
https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
2017-04-02 22:33 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
> Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
> constant 1.
>
> The option b) can match your requirement in my understanding. You just
> need define "product_name" as the "group by" column in TopN, and constant 1
> as the "sum|order by" column; dont' forget to use "city" as cube's
> dimension, then you can fetch the top products with SQL like:
>
> select city, product_name, sum(1) as occurancy from fact_table where city
> in ("abc") group by city order by occurancy desc limit 100;
>
> If the "product_name" is a UHC column, you'd better use a non-dict
> encoding (like "fixed_length") method for it.
>
>
>
> 2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
>
>> group by SUM, or group by COUNT is reasonable and supported. There is no
>> order by name alphabetical support.
>>
>> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
>>
>> > hi,
>> >
>> > i believe it's not supported. besides, how do you define "order" on
>> string?
>> > I don't think it's a reasonable requirement
>> >
>> > --
>> > Regards,
>> >
>> > *Bin Mahone | 马洪宾*
>> >
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
--
Best regards,
Shaofeng Shi 史少锋
Re: Question regarding topN measure on string column
Posted by ShaoFeng Shi <sh...@apache.org>.
Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
constant 1.
The option b) can match your requirement in my understanding. You just need
define "product_name" as the "group by" column in TopN, and constant 1 as
the "sum|order by" column; dont' forget to use "city" as cube's dimension,
then you can fetch the top products with SQL like:
select city, product_name, sum(1) as occurancy from fact_table where city
in ("abc") group by city order by occurancy desc limit 100;
If the "product_name" is a UHC column, you'd better use a non-dict encoding
(like "fixed_length") method for it.
2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
> group by SUM, or group by COUNT is reasonable and supported. There is no
> order by name alphabetical support.
>
> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
>
> > hi,
> >
> > i believe it's not supported. besides, how do you define "order" on
> string?
> > I don't think it's a reasonable requirement
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> >
>
--
Best regards,
Shaofeng Shi 史少锋