You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by "Lu, Kang-Sen" <kl...@rbbn.com> on 2020/01/08 16:54:44 UTC
question about kylin query engine's data accuracy
I am running kylin 2.6. I just noticed that when I run the following sql statement again the hive and kylin, the result are different. Did anyone see the same problem?
SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00') TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01') GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;
Hive query returns:
+---------------+----------------------------------+--+
| time_key | vl_aggs_model___sum_event_count |
+---------------+----------------------------------+--+
| 201905110100 | 8968815 |
+---------------+----------------------------------+--+
The kylin insight returned:
201905110100 | 8968800
The row count in hive is 15 more than in kylin.
Thanks.
Kang-sen
-----------------------------------------------------------------------------------------------------------------------
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended
recipient, please notify the sender immediately and then delete all copies, including any attachments.
-----------------------------------------------------------------------------------------------------------------------
Re: question about kylin query engine's data accuracy
Posted by ShaoFeng Shi <sh...@apache.org>.
Did you check the answers in FAQ? It lists several situations.
https://kylin.apache.org/docs/gettingstarted/faq.html
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
nichunen <ni...@apache.org> 于2020年1月9日周四 上午9:45写道:
>
> Hi Kang-sen,
>
> Can you reproduce it with Kylin’s sample cube?
>
> Best regards,
>
>
>
> Ni Chunen / George
>
>
> On 01/9/2020 00:54,Lu, Kang-Sen<kl...@rbbn.com> <kl...@rbbn.com> wrote:
>
> I am running kylin 2.6. I just noticed that when I run the following sql
> statement again the hive and kylin, the result are different. Did anyone
> see the same problem?
>
>
>
> SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00')
> TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE
> (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01')
> GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;
>
>
>
> Hive query returns:
>
>
>
> +---------------+----------------------------------+--+
>
> | time_key | vl_aggs_model___sum_event_count |
>
> +---------------+----------------------------------+--+
>
> | 201905110100 | 8968815 |
>
> +---------------+----------------------------------+--+
>
>
>
> The kylin insight returned:
>
>
>
> 201905110100 | 8968800
>
>
>
> The row count in hive is 15 more than in kylin.
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
>
>
>
> ------------------------------
> Notice: This e-mail together with any attachments may contain information
> of Ribbon Communications Inc. that is confidential and/or proprietary for
> the sole use of the intended recipient. Any review, disclosure, reliance or
> distribution by others or forwarding without express permission is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately and then delete all copies, including any attachments.
> ------------------------------
>
>
Re:question about kylin query engine's data accuracy
Posted by nichunen <ni...@apache.org>.
Hi Kang-sen,
Can you reproduce it with Kylin’s sample cube?
Best regards,
Ni Chunen / George
On 01/9/2020 00:54,Lu, Kang-Sen<kl...@rbbn.com> wrote:
I am running kylin 2.6. I just noticed that when I run the following sql statement again the hive and kylin, the result are different. Did anyone see the same problem?
SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00') TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01') GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;
Hive query returns:
+---------------+----------------------------------+--+
| time_key | vl_aggs_model___sum_event_count |
+---------------+----------------------------------+--+
| 201905110100 | 8968815 |
+---------------+----------------------------------+--+
The kylin insight returned:
201905110100 | 8968800
The row count in hive is 15 more than in kylin.
Thanks.
Kang-sen
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.