You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by 陈兆卿 <ch...@dangdang.com> on 2018/02/11 10:44:11 UTC

Got different results with two SQLs in Kylin, but same in Hive

Hi devs of Kylin:
I’m a engineer from China and new to Kylin. I came across a problem when using it.
I had two queries that I believed should return the same result. But actually they did not in Kylin.
I tried them in Hive and they returned the same.
SQL1 will return the same result as in Hive, but the SQL2.
It seems that the SQL2 returned a error result.
I cannot figure it out.
So I need your help.

The two queries are:
SQL1:
SELECT count(*)
FROM search_order_15
LEFT JOIN order_product_15_dim ON (search_order_15.dt=order_product_15_dim.dt
                                   AND search_order_15.main_product_id=order_product_15_dim.main_product_id)
WHERE search_order_15.dt=date'2017-01-01';

SQL2:
   SELECT count(*)
FROM
  (SELECT *
   FROM search_order_15
   WHERE dt=date'2017-01-01') lt
LEFT JOIN
  (SELECT *
   FROM order_product_15_dim
   WHERE dt=date'2017-01-01') rt ON (lt.main_product_id=rt.main_product_id);


The model is:
[cid:image001.png@01D3A362.DDA6B950]

(Kylin 2.2.0)

Re: Got different results with two SQLs in Kylin, but same in Hive

Posted by Billy Liu <bi...@apache.org>.
Hello Zhaoqing,

Very good question. Kylin is designed for aggregated query. But in SQL2,
the subquery will do the detailed query first. In Kylin, the detailed query
will get result from based cuboid, not the raw fact table. The result is
not accurate. To learn more about this behavior, please check
https://issues.apache.org/jira/browse/KYLIN-1792


With Warm regards

Billy Liu

2018-02-11 18:44 GMT+08:00 陈兆卿 <ch...@dangdang.com>:

> Hi devs of Kylin:
>
> I’m a engineer from China and new to Kylin. I came across a problem when
> using it.
>
> I had two queries that I believed should return the same result. But
> actually they did not in Kylin.
>
> I tried them in Hive and they returned the same.
>
> SQL1 will return the same result as in Hive, but the SQL2.
>
> It seems that the SQL2 returned a error result.
>
> I cannot figure it out.
>
> So I need your help.
>
>
>
> The two queries are:
>
> *SQL1:*
>
> *SELECT count(*)*
>
> *FROM search_order_15*
>
> *LEFT JOIN order_product_15_dim ON
> (search_order_15.dt=order_product_15_dim.dt*
>
> *                                   AND
> search_order_15.main_product_id=order_product_15_dim.main_product_id)*
>
> *WHERE search_order_15.dt=date'2017-01-01**';*
>
>
>
> *SQL2:*
>
> *   SELECT count(*)*
>
> *FROM*
>
> *  (SELECT **
>
> *   FROM search_order_15*
>
> *   WHERE dt=date'2017-01-01') lt*
>
> *LEFT JOIN*
>
> *  (SELECT **
>
> *   FROM order_product_15_dim*
>
> *   WHERE dt=date'2017-01-01') rt ON
> (lt.main_product_id=rt.main_product_id);*
>
>
>
>
>
> The model is:
>
>
>
> (Kylin 2.2.0)
>