You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Shailesh Prajapati <sh...@infoworks.io> on 2017/03/30 06:02:38 UTC

Question regarding topN measure on string column

Hi,

I am using kylin 1.6 and trying to define a topN measure on string column
(in SUM | ORDER BY field), which is giving me Number format exception. I
understand that kylin is expecting a numeric column to do sum operation on
it. But, I just wanted to ask that is there any way to define topN on
string column. Is topN only works with numeric column? Maybe instead of sum
if we can use count on string column.

My use case is like -  Top 100 product names group by City.

-- 
Shailesh

Re: Question regarding topN measure on string column

Posted by Shailesh Prajapati <sh...@infoworks.io>.
Thanks for the help. The description and query you provided makes sense.

On Sun, Apr 2, 2017 at 8:04 PM, ShaoFeng Shi <sh...@apache.org> wrote:

> The sample SQL missed product_name in group by, it should be:
>
> select city, product_name, sum(1) as occurancy from fact_table where city
> in ("abc") group by city, product_name order by occurancy desc limit 100;
>
> To get a better understanding of TopN, please check
> https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
>
> 2017-04-02 22:33 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>
> > Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
> > constant 1.
> >
> > The option b) can match your requirement in my understanding. You just
> > need define "product_name" as the "group by" column in TopN, and
> constant 1
> > as the "sum|order by" column; dont' forget to use "city" as cube's
> > dimension, then you can fetch the top products with SQL like:
> >
> > select city, product_name, sum(1) as occurancy from fact_table where city
> > in ("abc") group by city order by occurancy desc limit 100;
> >
> > If the "product_name" is a UHC column, you'd better use a non-dict
> > encoding (like "fixed_length") method for it.
> >
> >
> >
> > 2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
> >
> >> group by SUM, or group by COUNT is reasonable and supported. There is no
> >> order by name alphabetical support.
> >>
> >> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
> >>
> >> > ​hi,
> >> >
> >> > i believe it's not supported. besides, how do you define "order" on
> >> string?
> >> > I don't think it's a reasonable requirement
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > *Bin Mahone | 马洪宾*
> >> >
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
> >
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>



-- 
Shailesh

Re: Question regarding topN measure on string column

Posted by ShaoFeng Shi <sh...@apache.org>.
The sample SQL missed product_name in group by, it should be:

select city, product_name, sum(1) as occurancy from fact_table where city
in ("abc") group by city, product_name order by occurancy desc limit 100;

To get a better understanding of TopN, please check
https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/

2017-04-02 22:33 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:

> Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
> constant 1.
>
> The option b) can match your requirement in my understanding. You just
> need define "product_name" as the "group by" column in TopN, and constant 1
> as the "sum|order by" column; dont' forget to use "city" as cube's
> dimension, then you can fetch the top products with SQL like:
>
> select city, product_name, sum(1) as occurancy from fact_table where city
> in ("abc") group by city order by occurancy desc limit 100;
>
> If the "product_name" is a UHC column, you'd better use a non-dict
> encoding (like "fixed_length") method for it.
>
>
>
> 2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:
>
>> group by SUM, or group by COUNT is reasonable and supported. There is no
>> order by name alphabetical support.
>>
>> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
>>
>> > ​hi,
>> >
>> > i believe it's not supported. besides, how do you define "order" on
>> string?
>> > I don't think it's a reasonable requirement
>> >
>> > --
>> > Regards,
>> >
>> > *Bin Mahone | 马洪宾*
>> >
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Question regarding topN measure on string column

Posted by ShaoFeng Shi <sh...@apache.org>.
Kylin TopN's "sum|order by" supports two options a) a numeric column, b)
constant 1.

The option b) can match your requirement in my understanding. You just need
define "product_name" as the "group by" column in TopN, and constant 1 as
the "sum|order by" column; dont' forget to use "city" as cube's dimension,
then you can fetch the top products with SQL like:

select city, product_name, sum(1) as occurancy from fact_table where city
in ("abc") group by city order by occurancy desc limit 100;

If the "product_name" is a UHC column, you'd better use a non-dict encoding
(like "fixed_length") method for it.



2017-04-01 9:07 GMT+08:00 Billy Liu <bi...@apache.org>:

> group by SUM, or group by COUNT is reasonable and supported. There is no
> order by name alphabetical support.
>
> 2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:
>
> > ​hi,
> >
> > i believe it's not supported. besides, how do you define "order" on
> string?
> > I don't think it's a reasonable requirement
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Question regarding topN measure on string column

Posted by Billy Liu <bi...@apache.org>.
group by SUM, or group by COUNT is reasonable and supported. There is no
order by name alphabetical support.

2017-03-31 20:16 GMT+08:00 hongbin ma <ma...@apache.org>:

> ​hi,
>
> i believe it's not supported. besides, how do you define "order" on string?
> I don't think it's a reasonable requirement
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>

Re: Question regarding topN measure on string column

Posted by hongbin ma <ma...@apache.org>.
​hi,

i believe it's not supported. besides, how do you define "order" on string?
I don't think it's a reasonable requirement

-- 
Regards,

*Bin Mahone | 马洪宾*