You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by "Lu, Kang-Sen" <kl...@rbbn.com> on 2020/01/08 16:54:44 UTC

question about kylin query engine's data accuracy

I am running kylin 2.6. I just noticed that when I run the following sql statement again the hive and kylin, the result are different. Did anyone see the same problem?

SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00') TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01') GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;

Hive query returns:

+---------------+----------------------------------+--+
|   time_key    | vl_aggs_model___sum_event_count  |
+---------------+----------------------------------+--+
| 201905110100  | 8968815                          |
+---------------+----------------------------------+--+

The kylin insight returned:

201905110100 | 8968800

The row count in hive is 15 more than in kylin.

Thanks.

Kang-sen


-----------------------------------------------------------------------------------------------------------------------
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, including any attachments.
-----------------------------------------------------------------------------------------------------------------------

Re: question about kylin query engine's data accuracy

Posted by ShaoFeng Shi <sh...@apache.org>.
Did you check the answers in FAQ? It lists several situations.
https://kylin.apache.org/docs/gettingstarted/faq.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




nichunen <ni...@apache.org> 于2020年1月9日周四 上午9:45写道:

>
> Hi Kang-sen,
>
> Can you reproduce it with Kylin’s sample cube?
>
> Best regards,
>
>
>
> Ni Chunen / George
>
>
> On 01/9/2020 00:54,Lu, Kang-Sen<kl...@rbbn.com> <kl...@rbbn.com> wrote:
>
> I am running kylin 2.6. I just noticed that when I run the following sql
> statement again the hive and kylin, the result are different. Did anyone
> see the same problem?
>
>
>
> SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00')
> TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE
> (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01')
> GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;
>
>
>
> Hive query returns:
>
>
>
> +---------------+----------------------------------+--+
>
> |   time_key    | vl_aggs_model___sum_event_count  |
>
> +---------------+----------------------------------+--+
>
> | 201905110100  | 8968815                          |
>
> +---------------+----------------------------------+--+
>
>
>
> The kylin insight returned:
>
>
>
> 201905110100 | 8968800
>
>
>
> The row count in hive is 15 more than in kylin.
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
>
>
>
> ------------------------------
> Notice: This e-mail together with any attachments may contain information
> of Ribbon Communications Inc. that is confidential and/or proprietary for
> the sole use of the intended recipient. Any review, disclosure, reliance or
> distribution by others or forwarding without express permission is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately and then delete all copies, including any attachments.
> ------------------------------
>
>

Re:question about kylin query engine's data accuracy

Posted by nichunen <ni...@apache.org>.

Hi Kang-sen,


Can you reproduce it with Kylin’s sample cube?



Best regards,

 

Ni Chunen / George



On 01/9/2020 00:54,Lu, Kang-Sen<kl...@rbbn.com> wrote:

I am running kylin 2.6. I just noticed that when I run the following sql statement again the hive and kylin, the result are different. Did anyone see the same problem?

 

SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00') TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01') GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;

 

Hive query returns:

 

+---------------+----------------------------------+--+

|   time_key    | vl_aggs_model___sum_event_count  |

+---------------+----------------------------------+--+

| 201905110100  | 8968815                          |

+---------------+----------------------------------+--+

 

The kylin insight returned:

 

201905110100 | 8968800

 

The row count in hive is 15 more than in kylin.

 

Thanks.

 

Kang-sen

 

 




Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.