You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Billy Liu <bi...@apache.org> on 2017/04/01 01:07:35 UTC

Re: Question regarding topN measure on string column

group by SUM, or group by COUNT is reasonable and supported. There is no
order by name alphabetical support.

2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:

> ​hi,
>
> i believe it's not supported. besides, how do you define "order" on string?
> I don't think it's a reasonable requirement
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>

Re: Question regarding topN measure on string column

Posted by Shailesh Prajapati <sh...@infoworks.io>.
Thanks for the help. The description and query you provided makes sense.

On Sun, Apr 2, 2017 at 8:04 PM, ShaoFeng Shi <sh...@apache.org> wrote:

> The sample SQL missed product_name in group by, it should be:
>
> select city, product_name, sum(1) as occurancy from fact_table where city
> in ("abc") group by city, product_name order by occurancy desc limit 100;
>
> To get a better understanding of TopN, please check
> https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
>
> 2017-04-02 22:33 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>
> > Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
> > constant 1.
> >
> > The option b) can match your requirement in my understanding. You just
> > need define "product_name" as the "group by" column in TopN, and
> constant 1
> > as the "sum|order by" column; dont' forget to use "city" as cube's
> > dimension, then you can fetch the top products with SQL like:
> >
> > select city, product_name, sum(1) as occurancy from fact_table where city
> > in ("abc") group by city order by occurancy desc limit 100;
> >
> > If the "product_name" is a UHC column, you'd better use a non-dict
> > encoding (like "fixed_length") method for it.
> >
> >
> >
> > 2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
> >
> >> group by SUM, or group by COUNT is reasonable and supported. There is no
> >> order by name alphabetical support.
> >>
> >> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
> >>
> >> > ​hi,
> >> >
> >> > i believe it's not supported. besides, how do you define "order" on
> >> string?
> >> > I don't think it's a reasonable requirement
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > *Bin Mahone | 马洪宾*
> >> >
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
> >
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>



-- 
Shailesh

Re: Question regarding topN measure on string column

Posted by ShaoFeng Shi <sh...@apache.org>.
The sample SQL missed product_name in group by, it should be:

select city, product_name, sum(1) as occurancy from fact_table where city
in ("abc") group by city, product_name order by occurancy desc limit 100;

To get a better understanding of TopN, please check
https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/

2017-04-02 22:33 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:

> Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
> constant 1.
>
> The option b) can match your requirement in my understanding. You just
> need define "product_name" as the "group by" column in TopN, and constant 1
> as the "sum|order by" column; dont' forget to use "city" as cube's
> dimension, then you can fetch the top products with SQL like:
>
> select city, product_name, sum(1) as occurancy from fact_table where city
> in ("abc") group by city order by occurancy desc limit 100;
>
> If the "product_name" is a UHC column, you'd better use a non-dict
> encoding (like "fixed_length") method for it.
>
>
>
> 2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
>
>> group by SUM, or group by COUNT is reasonable and supported. There is no
>> order by name alphabetical support.
>>
>> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
>>
>> > ​hi,
>> >
>> > i believe it's not supported. besides, how do you define "order" on
>> string?
>> > I don't think it's a reasonable requirement
>> >
>> > --
>> > Regards,
>> >
>> > *Bin Mahone | 马洪宾*
>> >
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Question regarding topN measure on string column

Posted by ShaoFeng Shi <sh...@apache.org>.
Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
constant 1.

The option b) can match your requirement in my understanding. You just need
define "product_name" as the "group by" column in TopN, and constant 1 as
the "sum|order by" column; dont' forget to use "city" as cube's dimension,
then you can fetch the top products with SQL like:

select city, product_name, sum(1) as occurancy from fact_table where city
in ("abc") group by city order by occurancy desc limit 100;

If the "product_name" is a UHC column, you'd better use a non-dict encoding
(like "fixed_length") method for it.



2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:

> group by SUM, or group by COUNT is reasonable and supported. There is no
> order by name alphabetical support.
>
> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
>
> > ​hi,
> >
> > i believe it's not supported. besides, how do you define "order" on
> string?
> > I don't think it's a reasonable requirement
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋