You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by yu feng <ol...@gmail.com> on 2017/09/27 03:17:39 UTC

query return error result.

I have a cube like this :
dimensions : source_type, source_id, name, dt
measures:count(distinct uid), count(1) , count(distinct buyer)

I run the query :

select source_type, source_id, name,
    count(distinct uid), count(uid) as cnum, count(distinct buyer) as
buyerNum,
    count(buyer) as bnum
from
    table_name
where
    dt between '2017-06-01' and '2017-09-18'
    and source_id is not null
    and source_type is not null
group by
    source_type, source_id, name
order by buyerNum desc limit 1 offset 0

return :

mv
423031
起点‧终站
193794
92
42043
92





obviously, it is error result, I query the sourceid like this:

select source_type, source_id, name,
    count(distinct uid), count(uid) as cnum, count(distinct buyer) as
buyerNum,
    count(buyer) as bnum
from
    vip_buying_funnel_cube_view
where
    dt between '2017-06-01' and '2017-09-18'
    and source_id is not null
    and source_type is not null
    and source_id = '423031'
group by
    source_type, source_id, name
order by buyerNum desc limit 1 offset 0

the result is corrent :

mv
423031
起点‧终站
77
92
11
92

Re: query return error result.

Posted by Billy Liu <bi...@apache.org>.
Nice found. +1

2017-10-11 15:08 GMT+08:00 yu feng <ol...@gmail.com>:

> I checked the code, and find root cause is DumpMerger.enqueueFromDump()
>
> I create a jira KYLIN-2926
> <https://issues.apache.org/jira/browse/KYLIN-2926> to trace the bug.
>
>
>
> 2017-10-09 10:37 GMT+08:00 yu feng <ol...@gmail.com>:
>
>> the cube is using hllc15, we are tracing the code and try to find the
>> reason.
>>
>> 2017-10-08 14:52 GMT+08:00 Li Yang <li...@apache.org>:
>>
>>> Interesting... is it HLL count distinct or bitmap count distinct?
>>>
>>> On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
>>>
>>>> I add some log and find data from hbase is incorrect.
>>>>
>>>> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>>>>
>>>>> I have a cube like this :
>>>>> dimensions : source_type, source_id, name, dt
>>>>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>>>>
>>>>> I run the query :
>>>>>
>>>>> select source_type, source_id, name,
>>>>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>>> buyerNum,
>>>>>     count(buyer) as bnum
>>>>> from
>>>>>     table_name
>>>>> where
>>>>>     dt between '2017-06-01' and '2017-09-18'
>>>>>     and source_id is not null
>>>>>     and source_type is not null
>>>>> group by
>>>>>     source_type, source_id, name
>>>>> order by buyerNum desc limit 1 offset 0
>>>>>
>>>>> return :
>>>>>
>>>>> mv
>>>>> 423031
>>>>> 起点‧终站
>>>>> 193794
>>>>> 92
>>>>> 42043
>>>>> 92
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> obviously, it is error result, I query the sourceid like this:
>>>>>
>>>>> select source_type, source_id, name,
>>>>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>>> buyerNum,
>>>>>     count(buyer) as bnum
>>>>> from
>>>>>     vip_buying_funnel_cube_view
>>>>> where
>>>>>     dt between '2017-06-01' and '2017-09-18'
>>>>>     and source_id is not null
>>>>>     and source_type is not null
>>>>>     and source_id = '423031'
>>>>> group by
>>>>>     source_type, source_id, name
>>>>> order by buyerNum desc limit 1 offset 0
>>>>>
>>>>> the result is corrent :
>>>>>
>>>>> mv
>>>>> 423031
>>>>> 起点‧终站
>>>>> 77
>>>>> 92
>>>>> 11
>>>>> 92
>>>>>
>>>>
>>>>
>>>
>>
>

Re: query return error result.

Posted by yu feng <ol...@gmail.com>.
I checked the code, and find root cause is DumpMerger.enqueueFromDump()

I create a jira KYLIN-2926
<https://issues.apache.org/jira/browse/KYLIN-2926> to trace the bug.



2017-10-09 10:37 GMT+08:00 yu feng <ol...@gmail.com>:

> the cube is using hllc15, we are tracing the code and try to find the
> reason.
>
> 2017-10-08 14:52 GMT+08:00 Li Yang <li...@apache.org>:
>
>> Interesting... is it HLL count distinct or bitmap count distinct?
>>
>> On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
>>
>>> I add some log and find data from hbase is incorrect.
>>>
>>> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>>>
>>>> I have a cube like this :
>>>> dimensions : source_type, source_id, name, dt
>>>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>>>
>>>> I run the query :
>>>>
>>>> select source_type, source_id, name,
>>>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>> buyerNum,
>>>>     count(buyer) as bnum
>>>> from
>>>>     table_name
>>>> where
>>>>     dt between '2017-06-01' and '2017-09-18'
>>>>     and source_id is not null
>>>>     and source_type is not null
>>>> group by
>>>>     source_type, source_id, name
>>>> order by buyerNum desc limit 1 offset 0
>>>>
>>>> return :
>>>>
>>>> mv
>>>> 423031
>>>> 起点‧终站
>>>> 193794
>>>> 92
>>>> 42043
>>>> 92
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> obviously, it is error result, I query the sourceid like this:
>>>>
>>>> select source_type, source_id, name,
>>>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>> buyerNum,
>>>>     count(buyer) as bnum
>>>> from
>>>>     vip_buying_funnel_cube_view
>>>> where
>>>>     dt between '2017-06-01' and '2017-09-18'
>>>>     and source_id is not null
>>>>     and source_type is not null
>>>>     and source_id = '423031'
>>>> group by
>>>>     source_type, source_id, name
>>>> order by buyerNum desc limit 1 offset 0
>>>>
>>>> the result is corrent :
>>>>
>>>> mv
>>>> 423031
>>>> 起点‧终站
>>>> 77
>>>> 92
>>>> 11
>>>> 92
>>>>
>>>
>>>
>>
>

Re: query return error result.

Posted by yu feng <ol...@gmail.com>.
the cube is using hllc15, we are tracing the code and try to find the
reason.

2017-10-08 14:52 GMT+08:00 Li Yang <li...@apache.org>:

> Interesting... is it HLL count distinct or bitmap count distinct?
>
> On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
>
>> I add some log and find data from hbase is incorrect.
>>
>> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>>
>>> I have a cube like this :
>>> dimensions : source_type, source_id, name, dt
>>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>>
>>> I run the query :
>>>
>>> select source_type, source_id, name,
>>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>> buyerNum,
>>>     count(buyer) as bnum
>>> from
>>>     table_name
>>> where
>>>     dt between '2017-06-01' and '2017-09-18'
>>>     and source_id is not null
>>>     and source_type is not null
>>> group by
>>>     source_type, source_id, name
>>> order by buyerNum desc limit 1 offset 0
>>>
>>> return :
>>>
>>> mv
>>> 423031
>>> 起点‧终站
>>> 193794
>>> 92
>>> 42043
>>> 92
>>>
>>>
>>>
>>>
>>>
>>> obviously, it is error result, I query the sourceid like this:
>>>
>>> select source_type, source_id, name,
>>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>> buyerNum,
>>>     count(buyer) as bnum
>>> from
>>>     vip_buying_funnel_cube_view
>>> where
>>>     dt between '2017-06-01' and '2017-09-18'
>>>     and source_id is not null
>>>     and source_type is not null
>>>     and source_id = '423031'
>>> group by
>>>     source_type, source_id, name
>>> order by buyerNum desc limit 1 offset 0
>>>
>>> the result is corrent :
>>>
>>> mv
>>> 423031
>>> 起点‧终站
>>> 77
>>> 92
>>> 11
>>> 92
>>>
>>
>>
>

Re: query return error result.

Posted by Li Yang <li...@apache.org>.
Interesting... is it HLL count distinct or bitmap count distinct?

On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:

> I add some log and find data from hbase is incorrect.
>
> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>
>> I have a cube like this :
>> dimensions : source_type, source_id, name, dt
>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>
>> I run the query :
>>
>> select source_type, source_id, name,
>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>> buyerNum,
>>     count(buyer) as bnum
>> from
>>     table_name
>> where
>>     dt between '2017-06-01' and '2017-09-18'
>>     and source_id is not null
>>     and source_type is not null
>> group by
>>     source_type, source_id, name
>> order by buyerNum desc limit 1 offset 0
>>
>> return :
>>
>> mv
>> 423031
>> 起点‧终站
>> 193794
>> 92
>> 42043
>> 92
>>
>>
>>
>>
>>
>> obviously, it is error result, I query the sourceid like this:
>>
>> select source_type, source_id, name,
>>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>> buyerNum,
>>     count(buyer) as bnum
>> from
>>     vip_buying_funnel_cube_view
>> where
>>     dt between '2017-06-01' and '2017-09-18'
>>     and source_id is not null
>>     and source_type is not null
>>     and source_id = '423031'
>> group by
>>     source_type, source_id, name
>> order by buyerNum desc limit 1 offset 0
>>
>> the result is corrent :
>>
>> mv
>> 423031
>> 起点‧终站
>> 77
>> 92
>> 11
>> 92
>>
>
>

Re: query return error result.

Posted by yu feng <ol...@gmail.com>.
I add some log and find data from hbase is incorrect.

2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:

> I have a cube like this :
> dimensions : source_type, source_id, name, dt
> measures:count(distinct uid), count(1) , count(distinct buyer)
>
> I run the query :
>
> select source_type, source_id, name,
>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
> buyerNum,
>     count(buyer) as bnum
> from
>     table_name
> where
>     dt between '2017-06-01' and '2017-09-18'
>     and source_id is not null
>     and source_type is not null
> group by
>     source_type, source_id, name
> order by buyerNum desc limit 1 offset 0
>
> return :
>
> mv
> 423031
> 起点‧终站
> 193794
> 92
> 42043
> 92
>
>
>
>
>
> obviously, it is error result, I query the sourceid like this:
>
> select source_type, source_id, name,
>     count(distinct uid), count(uid) as cnum, count(distinct buyer) as
> buyerNum,
>     count(buyer) as bnum
> from
>     vip_buying_funnel_cube_view
> where
>     dt between '2017-06-01' and '2017-09-18'
>     and source_id is not null
>     and source_type is not null
>     and source_id = '423031'
> group by
>     source_type, source_id, name
> order by buyerNum desc limit 1 offset 0
>
> the result is corrent :
>
> mv
> 423031
> 起点‧终站
> 77
> 92
> 11
> 92
>