You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by yu feng <ol...@gmail.com> on 2017/09/27 03:17:39 UTC
query return error result.
I have a cube like this :
dimensions : source_type, source_id, name, dt
measures:count(distinct uid), count(1) , count(distinct buyer)
I run the query :
select source_type, source_id, name,
count(distinct uid), count(uid) as cnum, count(distinct buyer) as
buyerNum,
count(buyer) as bnum
from
table_name
where
dt between '2017-06-01' and '2017-09-18'
and source_id is not null
and source_type is not null
group by
source_type, source_id, name
order by buyerNum desc limit 1 offset 0
return :
mv
423031
起点‧终站
193794
92
42043
92
obviously, it is error result, I query the sourceid like this:
select source_type, source_id, name,
count(distinct uid), count(uid) as cnum, count(distinct buyer) as
buyerNum,
count(buyer) as bnum
from
vip_buying_funnel_cube_view
where
dt between '2017-06-01' and '2017-09-18'
and source_id is not null
and source_type is not null
and source_id = '423031'
group by
source_type, source_id, name
order by buyerNum desc limit 1 offset 0
the result is corrent :
mv
423031
起点‧终站
77
92
11
92
Re: query return error result.
Posted by Billy Liu <bi...@apache.org>.
Nice found. +1
2017-10-11 15:08 GMT+08:00 yu feng <ol...@gmail.com>:
> I checked the code, and find root cause is DumpMerger.enqueueFromDump()
>
> I create a jira KYLIN-2926
> <https://issues.apache.org/jira/browse/KYLIN-2926> to trace the bug.
>
>
>
> 2017-10-09 10:37 GMT+08:00 yu feng <ol...@gmail.com>:
>
>> the cube is using hllc15, we are tracing the code and try to find the
>> reason.
>>
>> 2017-10-08 14:52 GMT+08:00 Li Yang <li...@apache.org>:
>>
>>> Interesting... is it HLL count distinct or bitmap count distinct?
>>>
>>> On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
>>>
>>>> I add some log and find data from hbase is incorrect.
>>>>
>>>> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>>>>
>>>>> I have a cube like this :
>>>>> dimensions : source_type, source_id, name, dt
>>>>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>>>>
>>>>> I run the query :
>>>>>
>>>>> select source_type, source_id, name,
>>>>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>>> buyerNum,
>>>>> count(buyer) as bnum
>>>>> from
>>>>> table_name
>>>>> where
>>>>> dt between '2017-06-01' and '2017-09-18'
>>>>> and source_id is not null
>>>>> and source_type is not null
>>>>> group by
>>>>> source_type, source_id, name
>>>>> order by buyerNum desc limit 1 offset 0
>>>>>
>>>>> return :
>>>>>
>>>>> mv
>>>>> 423031
>>>>> 起点‧终站
>>>>> 193794
>>>>> 92
>>>>> 42043
>>>>> 92
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> obviously, it is error result, I query the sourceid like this:
>>>>>
>>>>> select source_type, source_id, name,
>>>>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>>> buyerNum,
>>>>> count(buyer) as bnum
>>>>> from
>>>>> vip_buying_funnel_cube_view
>>>>> where
>>>>> dt between '2017-06-01' and '2017-09-18'
>>>>> and source_id is not null
>>>>> and source_type is not null
>>>>> and source_id = '423031'
>>>>> group by
>>>>> source_type, source_id, name
>>>>> order by buyerNum desc limit 1 offset 0
>>>>>
>>>>> the result is corrent :
>>>>>
>>>>> mv
>>>>> 423031
>>>>> 起点‧终站
>>>>> 77
>>>>> 92
>>>>> 11
>>>>> 92
>>>>>
>>>>
>>>>
>>>
>>
>
Re: query return error result.
Posted by yu feng <ol...@gmail.com>.
I checked the code, and find root cause is DumpMerger.enqueueFromDump()
I create a jira KYLIN-2926
<https://issues.apache.org/jira/browse/KYLIN-2926> to trace the bug.
2017-10-09 10:37 GMT+08:00 yu feng <ol...@gmail.com>:
> the cube is using hllc15, we are tracing the code and try to find the
> reason.
>
> 2017-10-08 14:52 GMT+08:00 Li Yang <li...@apache.org>:
>
>> Interesting... is it HLL count distinct or bitmap count distinct?
>>
>> On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
>>
>>> I add some log and find data from hbase is incorrect.
>>>
>>> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>>>
>>>> I have a cube like this :
>>>> dimensions : source_type, source_id, name, dt
>>>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>>>
>>>> I run the query :
>>>>
>>>> select source_type, source_id, name,
>>>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>> buyerNum,
>>>> count(buyer) as bnum
>>>> from
>>>> table_name
>>>> where
>>>> dt between '2017-06-01' and '2017-09-18'
>>>> and source_id is not null
>>>> and source_type is not null
>>>> group by
>>>> source_type, source_id, name
>>>> order by buyerNum desc limit 1 offset 0
>>>>
>>>> return :
>>>>
>>>> mv
>>>> 423031
>>>> 起点‧终站
>>>> 193794
>>>> 92
>>>> 42043
>>>> 92
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> obviously, it is error result, I query the sourceid like this:
>>>>
>>>> select source_type, source_id, name,
>>>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>>> buyerNum,
>>>> count(buyer) as bnum
>>>> from
>>>> vip_buying_funnel_cube_view
>>>> where
>>>> dt between '2017-06-01' and '2017-09-18'
>>>> and source_id is not null
>>>> and source_type is not null
>>>> and source_id = '423031'
>>>> group by
>>>> source_type, source_id, name
>>>> order by buyerNum desc limit 1 offset 0
>>>>
>>>> the result is corrent :
>>>>
>>>> mv
>>>> 423031
>>>> 起点‧终站
>>>> 77
>>>> 92
>>>> 11
>>>> 92
>>>>
>>>
>>>
>>
>
Re: query return error result.
Posted by yu feng <ol...@gmail.com>.
the cube is using hllc15, we are tracing the code and try to find the
reason.
2017-10-08 14:52 GMT+08:00 Li Yang <li...@apache.org>:
> Interesting... is it HLL count distinct or bitmap count distinct?
>
> On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
>
>> I add some log and find data from hbase is incorrect.
>>
>> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>>
>>> I have a cube like this :
>>> dimensions : source_type, source_id, name, dt
>>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>>
>>> I run the query :
>>>
>>> select source_type, source_id, name,
>>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>> buyerNum,
>>> count(buyer) as bnum
>>> from
>>> table_name
>>> where
>>> dt between '2017-06-01' and '2017-09-18'
>>> and source_id is not null
>>> and source_type is not null
>>> group by
>>> source_type, source_id, name
>>> order by buyerNum desc limit 1 offset 0
>>>
>>> return :
>>>
>>> mv
>>> 423031
>>> 起点‧终站
>>> 193794
>>> 92
>>> 42043
>>> 92
>>>
>>>
>>>
>>>
>>>
>>> obviously, it is error result, I query the sourceid like this:
>>>
>>> select source_type, source_id, name,
>>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>>> buyerNum,
>>> count(buyer) as bnum
>>> from
>>> vip_buying_funnel_cube_view
>>> where
>>> dt between '2017-06-01' and '2017-09-18'
>>> and source_id is not null
>>> and source_type is not null
>>> and source_id = '423031'
>>> group by
>>> source_type, source_id, name
>>> order by buyerNum desc limit 1 offset 0
>>>
>>> the result is corrent :
>>>
>>> mv
>>> 423031
>>> 起点‧终站
>>> 77
>>> 92
>>> 11
>>> 92
>>>
>>
>>
>
Re: query return error result.
Posted by Li Yang <li...@apache.org>.
Interesting... is it HLL count distinct or bitmap count distinct?
On Wed, Sep 27, 2017 at 11:19 AM, yu feng <ol...@gmail.com> wrote:
> I add some log and find data from hbase is incorrect.
>
> 2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
>
>> I have a cube like this :
>> dimensions : source_type, source_id, name, dt
>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>
>> I run the query :
>>
>> select source_type, source_id, name,
>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>> buyerNum,
>> count(buyer) as bnum
>> from
>> table_name
>> where
>> dt between '2017-06-01' and '2017-09-18'
>> and source_id is not null
>> and source_type is not null
>> group by
>> source_type, source_id, name
>> order by buyerNum desc limit 1 offset 0
>>
>> return :
>>
>> mv
>> 423031
>> 起点‧终站
>> 193794
>> 92
>> 42043
>> 92
>>
>>
>>
>>
>>
>> obviously, it is error result, I query the sourceid like this:
>>
>> select source_type, source_id, name,
>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>> buyerNum,
>> count(buyer) as bnum
>> from
>> vip_buying_funnel_cube_view
>> where
>> dt between '2017-06-01' and '2017-09-18'
>> and source_id is not null
>> and source_type is not null
>> and source_id = '423031'
>> group by
>> source_type, source_id, name
>> order by buyerNum desc limit 1 offset 0
>>
>> the result is corrent :
>>
>> mv
>> 423031
>> 起点‧终站
>> 77
>> 92
>> 11
>> 92
>>
>
>
Re: query return error result.
Posted by yu feng <ol...@gmail.com>.
I add some log and find data from hbase is incorrect.
2017-09-27 11:17 GMT+08:00 yu feng <ol...@gmail.com>:
> I have a cube like this :
> dimensions : source_type, source_id, name, dt
> measures:count(distinct uid), count(1) , count(distinct buyer)
>
> I run the query :
>
> select source_type, source_id, name,
> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
> buyerNum,
> count(buyer) as bnum
> from
> table_name
> where
> dt between '2017-06-01' and '2017-09-18'
> and source_id is not null
> and source_type is not null
> group by
> source_type, source_id, name
> order by buyerNum desc limit 1 offset 0
>
> return :
>
> mv
> 423031
> 起点‧终站
> 193794
> 92
> 42043
> 92
>
>
>
>
>
> obviously, it is error result, I query the sourceid like this:
>
> select source_type, source_id, name,
> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
> buyerNum,
> count(buyer) as bnum
> from
> vip_buying_funnel_cube_view
> where
> dt between '2017-06-01' and '2017-09-18'
> and source_id is not null
> and source_type is not null
> and source_id = '423031'
> group by
> source_type, source_id, name
> order by buyerNum desc limit 1 offset 0
>
> the result is corrent :
>
> mv
> 423031
> 起点‧终站
> 77
> 92
> 11
> 92
>