You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by 杨浩 <ya...@gmail.com> on 2017/09/01 06:47:25 UTC

how to filter long tail data

If a index is less than 2, we don't want to store it in hbase . How to
filter the long tail data ?

Re: how to filter long tail data

Posted by 杨浩 <ya...@gmail.com>.
It's an elegant implementation. I have read the article
approximate-topn-measure
<http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/> , and
some problem meet in our situation

   1. The result is approximate. Our team is to supply statistics data for
   our company, a big company, and we don't want to be challenged by our users
   2. There is a little difference to filter data after all
   cuboid generated. If we have dimension with
   date、appId、appVersion、channel,measure with
   dayActiveUseCount、dayNewUseCount、dayUseCount、7dayActiveUseCount, we would
    filter data which's dayActiveUseCount less than 2 before. It's very hard
   to use Top-N to implement this , but if using default measure "_COUNT_"
   to filter data after all cuboid generated, it may be OK.

 It seems we have to change the souce code,  and supply a parameter to
filter data by "_COUNT_" after all cuboid generated


I have a question for the topN measure: does it also filter data for
default measure _COUNT_ which is not in the TopN ?



2017-09-05 15:28 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:

> Cool, that is the case of top N.
>
> 2017-09-05 12:00 GMT+08:00 杨浩 <ya...@gmail.com>:
>
>> Thanks. We would like to try Top-N measure. The "filter condition" filter
>> data from the source, but we want to filter the data after all cuboid built
>> for we don't know the long tail data unless building.
>>
>>
>> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>>
>>> Top-N measure is amied to filter the long tail data. Besides, in Data
>>> model, there is a "filter condition", where you can add a filtering
>>> condition to exclude those tail data.
>>>
>>> 2017-09-04 10:54 GMT+08:00 杨浩 <ya...@gmail.com>:
>>>
>>>> Okay, our team want to use Kylin as an ETL tool, but there are many
>>>> long tail data after building. Can these data be filtered directly by
>>>> kylin, or do we have to  make some change to the code ?
>>>>
>>>> 2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:
>>>>
>>>>> Please ask Kylin related question here.
>>>>>
>>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>>>>>
>>>>> > If a index is less than 2, we don't want to store it in hbase . How
>>>>> to
>>>>> > filter the long tail data ?
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: how to filter long tail data

Posted by ShaoFeng Shi <sh...@apache.org>.
Cool, that is the case of top N.

2017-09-05 12:00 GMT+08:00 杨浩 <ya...@gmail.com>:

> Thanks. We would like to try Top-N measure. The "filter condition" filter
> data from the source, but we want to filter the data after all cuboid built
> for we don't know the long tail data unless building.
>
>
> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>
>> Top-N measure is amied to filter the long tail data. Besides, in Data
>> model, there is a "filter condition", where you can add a filtering
>> condition to exclude those tail data.
>>
>> 2017-09-04 10:54 GMT+08:00 杨浩 <ya...@gmail.com>:
>>
>>> Okay, our team want to use Kylin as an ETL tool, but there are many long
>>> tail data after building. Can these data be filtered directly by kylin, or
>>> do we have to  make some change to the code ?
>>>
>>> 2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:
>>>
>>>> Please ask Kylin related question here.
>>>>
>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>>>>
>>>> > If a index is less than 2, we don't want to store it in hbase . How to
>>>> > filter the long tail data ?
>>>> >
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: how to filter long tail data

Posted by 杨浩 <ya...@gmail.com>.
Thanks. We would like to try Top-N measure. The "filter condition" filter
data from the source, but we want to filter the data after all cuboid built
for we don't know the long tail data unless building.


2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:

> Top-N measure is amied to filter the long tail data. Besides, in Data
> model, there is a "filter condition", where you can add a filtering
> condition to exclude those tail data.
>
> 2017-09-04 10:54 GMT+08:00 杨浩 <ya...@gmail.com>:
>
>> Okay, our team want to use Kylin as an ETL tool, but there are many long
>> tail data after building. Can these data be filtered directly by kylin, or
>> do we have to  make some change to the code ?
>>
>> 2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:
>>
>>> Please ask Kylin related question here.
>>>
>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>>>
>>> > If a index is less than 2, we don't want to store it in hbase . How to
>>> > filter the long tail data ?
>>> >
>>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: how to filter long tail data

Posted by ShaoFeng Shi <sh...@apache.org>.
Top-N measure is amied to filter the long tail data. Besides, in Data
model, there is a "filter condition", where you can add a filtering
condition to exclude those tail data.

2017-09-04 10:54 GMT+08:00 杨浩 <ya...@gmail.com>:

> Okay, our team want to use Kylin as an ETL tool, but there are many long
> tail data after building. Can these data be filtered directly by kylin, or
> do we have to  make some change to the code ?
>
> 2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:
>
>> Please ask Kylin related question here.
>>
>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>>
>> > If a index is less than 2, we don't want to store it in hbase . How to
>> > filter the long tail data ?
>> >
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: how to filter long tail data

Posted by ShaoFeng Shi <sh...@apache.org>.
Top-N measure is amied to filter the long tail data. Besides, in Data
model, there is a "filter condition", where you can add a filtering
condition to exclude those tail data.

2017-09-04 10:54 GMT+08:00 杨浩 <ya...@gmail.com>:

> Okay, our team want to use Kylin as an ETL tool, but there are many long
> tail data after building. Can these data be filtered directly by kylin, or
> do we have to  make some change to the code ?
>
> 2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:
>
>> Please ask Kylin related question here.
>>
>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>>
>> > If a index is less than 2, we don't want to store it in hbase . How to
>> > filter the long tail data ?
>> >
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: how to filter long tail data

Posted by 杨浩 <ya...@gmail.com>.
Okay, our team want to use Kylin as an ETL tool, but there are many long
tail data after building. Can these data be filtered directly by kylin, or
do we have to  make some change to the code ?

2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:

> Please ask Kylin related question here.
>
> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>
> > If a index is less than 2, we don't want to store it in hbase . How to
> > filter the long tail data ?
> >
>

Re: how to filter long tail data

Posted by 杨浩 <ya...@gmail.com>.
Okay, our team want to use Kylin as an ETL tool, but there are many long
tail data after building. Can these data be filtered directly by kylin, or
do we have to  make some change to the code ?

2017-09-03 19:42 GMT+08:00 Li Yang <li...@apache.org>:

> Please ask Kylin related question here.
>
> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:
>
> > If a index is less than 2, we don't want to store it in hbase . How to
> > filter the long tail data ?
> >
>

Re: how to filter long tail data

Posted by Li Yang <li...@apache.org>.
Please ask Kylin related question here.

On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:

> If a index is less than 2, we don't want to store it in hbase . How to
> filter the long tail data ?
>

Re: how to filter long tail data

Posted by Li Yang <li...@apache.org>.
Please ask Kylin related question here.

On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <ya...@gmail.com> wrote:

> If a index is less than 2, we don't want to store it in hbase . How to
> filter the long tail data ?
>