You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Abhishek Girish (JIRA)" <ji...@apache.org> on 2015/04/23 02:22:38 UTC

[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

    [ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508205#comment-14508205 ] 

Abhishek Girish commented on DRILL-2380:
----------------------------------------

Verified on Git.Commit.ID a0a1930 (April 21 build)

{code:sql}
WITH ss 
     AS (SELECT i_manufact_id, 
                Sum(ss_ext_sales_price) total_sales 
         FROM   store_sales, 
                date_dim, 
                customer_address, 
                item 
         WHERE  i_manufact_id IN (SELECT i_manufact_id 
                                  FROM   item 
                                  WHERE  i_category IN ( 'Books' )) 
                AND ss_item_sk = i_item_sk 
                AND ss_sold_date_sk = d_date_sk 
                AND d_year = 1999 
                AND d_moy = 3 
                AND ss_addr_sk = ca_address_sk 
                AND ca_gmt_offset = -5 
         GROUP  BY i_manufact_id), 
     cs 
     AS (SELECT i_manufact_id, 
                Sum(cs_ext_sales_price) total_sales 
         FROM   catalog_sales, 
                date_dim, 
                customer_address, 
                item 
         WHERE  i_manufact_id IN (SELECT i_manufact_id 
                                  FROM   item 
                                  WHERE  i_category IN ( 'Books' )) 
                AND cs_item_sk = i_item_sk 
                AND cs_sold_date_sk = d_date_sk 
                AND d_year = 1999 
                AND d_moy = 3 
                AND cs_bill_addr_sk = ca_address_sk 
                AND ca_gmt_offset = -5 
         GROUP  BY i_manufact_id), 
     ws 
     AS (SELECT i_manufact_id, 
                Sum(ws_ext_sales_price) total_sales 
         FROM   web_sales, 
                date_dim, 
                customer_address, 
                item 
         WHERE  i_manufact_id IN (SELECT i_manufact_id 
                                  FROM   item 
                                  WHERE  i_category IN ( 'Books' )) 
                AND ws_item_sk = i_item_sk 
                AND ws_sold_date_sk = d_date_sk 
                AND d_year = 1999 
                AND d_moy = 3 
                AND ws_bill_addr_sk = ca_address_sk 
                AND ca_gmt_offset = -5 
         GROUP  BY i_manufact_id) 
SELECT i_manufact_id, 
               Sum(total_sales) total_sales 
FROM   (SELECT i_manufact_id, total_sales 
        FROM   ss 
        UNION ALL 
        SELECT i_manufact_id, total_sales
        FROM   cs 
        UNION ALL 
        SELECT i_manufact_id, total_sales
        FROM   ws) tmp1 
GROUP  BY i_manufact_id 
ORDER  BY total_sales
LIMIT 10;

Drill:
+---------------+-------------+
| i_manufact_id | total_sales |
+---------------+-------------+
| 930           | 1.18        |
| 818           | 41.86       |
| 913           | 141.9       |
| 784           | 184.9       |
| 488           | 275.08      |
| 993           | 301.6       |
| 700           | 340.52000000000004 |
| 895           | 802.3       |
| 766           | 839.76      |
| 858           | 859.18      |
+---------------+-------------+
10 rows selected (11.535 seconds)

Postgres:
 i_manufact_id | total_sales 
---------------+-------------
           930 |        1.18
           818 |       41.86
           913 |       141.9
           784 |       184.9
           488 |      275.08
           993 |       301.6
           700 |      340.52
           895 |       802.3
           766 |      839.76
           858 |      859.18
(10 rows)
{code}

The issue is now resolved.

> TPC-DS Query 33 and simplified variants return wrong results
> ------------------------------------------------------------
>
>                 Key: DRILL-2380
>                 URL: https://issues.apache.org/jira/browse/DRILL-2380
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Abhishek Girish
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Critical
>             Fix For: 0.8.0
>
>
> TPC-DS query 33 returns wrong results. 
> {code:sql}
> WITH ss 
>      AS (SELECT i_manufact_id, 
>                 Sum(ss_ext_sales_price) total_sales 
>          FROM   store_sales, 
>                 date_dim, 
>                 customer_address, 
>                 item 
>          WHERE  i_manufact_id IN (SELECT i_manufact_id 
>                                   FROM   item 
>                                   WHERE  i_category IN ( 'Books' )) 
>                 AND ss_item_sk = i_item_sk 
>                 AND ss_sold_date_sk = d_date_sk 
>                 AND d_year = 1999 
>                 AND d_moy = 3 
>                 AND ss_addr_sk = ca_address_sk 
>                 AND ca_gmt_offset = -5 
>          GROUP  BY i_manufact_id), 
>      cs 
>      AS (SELECT i_manufact_id, 
>                 Sum(cs_ext_sales_price) total_sales 
>          FROM   catalog_sales, 
>                 date_dim, 
>                 customer_address, 
>                 item 
>          WHERE  i_manufact_id IN (SELECT i_manufact_id 
>                                   FROM   item 
>                                   WHERE  i_category IN ( 'Books' )) 
>                 AND cs_item_sk = i_item_sk 
>                 AND cs_sold_date_sk = d_date_sk 
>                 AND d_year = 1999 
>                 AND d_moy = 3 
>                 AND cs_bill_addr_sk = ca_address_sk 
>                 AND ca_gmt_offset = -5 
>          GROUP  BY i_manufact_id), 
>      ws 
>      AS (SELECT i_manufact_id, 
>                 Sum(ws_ext_sales_price) total_sales 
>          FROM   web_sales, 
>                 date_dim, 
>                 customer_address, 
>                 item 
>          WHERE  i_manufact_id IN (SELECT i_manufact_id 
>                                   FROM   item 
>                                   WHERE  i_category IN ( 'Books' )) 
>                 AND ws_item_sk = i_item_sk 
>                 AND ws_sold_date_sk = d_date_sk 
>                 AND d_year = 1999 
>                 AND d_moy = 3 
>                 AND ws_bill_addr_sk = ca_address_sk 
>                 AND ca_gmt_offset = -5 
>          GROUP  BY i_manufact_id) 
> SELECT i_manufact_id, 
>                Sum(total_sales) total_sales 
> FROM   (SELECT i_manufact_id, total_sales 
>         FROM   ss 
>         UNION ALL 
>         SELECT i_manufact_id, total_sales
>         FROM   cs 
>         UNION ALL 
>         SELECT i_manufact_id, total_sales
>         FROM   ws) tmp1 
> GROUP  BY i_manufact_id 
> ORDER  BY total_sales
> LIMIT 10;
> Drill Results:
> +---------------+-------------+
> | i_manufact_id | total_sales |
> +---------------+-------------+
> | 440           | 0.12        |
> | 434           | 13.16       |
> | 415           | 14.04       |
> | 449           | 15.63       |
> | 563           | 31.46       |
> | 357           | 49.50       |
> | 624           | 67.94       |
> | 192           | 74.40       |
> | 137           | 83.42       |
> | 240           | 85.26       |
> +---------------+-------------+
> 10 rows selected (7.57 seconds)
> Postgres Results:
>  i_manufact_id | total_sales 
> ---------------+-------------
>            930 |        1.18
>            818 |       41.86
>            913 |      141.90
>            784 |      184.90
>            488 |      275.08
>            993 |      301.60
>            700 |      340.52
>            895 |      802.30
>            766 |      839.76
>            858 |      859.18
> (10 rows)
> {code}
> The following simplified variants also return wrong results:
> {code:sql}
> SELECT sum(x)
> FROM
> (SELECT ss_ext_sales_price x, ss_item_sk
> FROM  store_sales
>  GROUP BY ss_item_sk, ss_ext_sales_price
> UNION ALL
> SELECT cs_ext_sales_price x, cs_item_sk
> FROM catalog_sales
> GROUP BY cs_item_sk, cs_ext_sales_price) tmp
> GROUP BY x
> LIMIT 10;
> Drill Results:
> +------------+
> |   EXPR$0   |
> +------------+
> | 14141.40   |
> | 28060.00   |
> | 30912.70   |
> | 43706.88   |
> | 38267.64   |
> | 10173.00   |
> | 37829.25   |
> | 5349.50    |
> | 107515.80  |
> | 4440.84    |
> +------------+
> 10 rows selected (14.435 seconds)
> Postgres Results:
>    sum    
> ----------   
>  45234.00
>   5735.31
>   2275.60
>   6921.32
>   2590.46
>   6615.09
>  14080.77
>  24819.76
>  25127.20
> (10 rows)
> SELECT sum(x)
> FROM
> (SELECT sum(ss_ext_sales_price) x, ss_item_sk
> FROM  store_sales
>  GROUP BY ss_item_sk
> UNION ALL
> SELECT sum(cs_ext_sales_price) x, cs_item_sk
> FROM catalog_sales
> GROUP BY cs_item_sk) tmp
> GROUP BY x
> LIMIT 10;
> Drill Results:
> +------------+
> |   EXPR$0   |
> +------------+
> | 211411.58  |
> | 347027.93  |
> | 534760.93  |
> | 203028.28  |
> | 500939.61  |
> | 248226.81  |
> | 242664.29  |
> | 597659.03  |
> | 258909.73  |
> | 223624.06  |
> +------------+
> 10 rows selected (5.245 seconds)
> Postgres Results:
>     sum    
> -----------
>  252711.42
>  173571.97
>  206191.60
>  249793.96
>  170825.75
>  127718.29
>  220887.50
>  119390.44
>  217495.66
>  284348.93
> (10 rows)
> SELECT x
> FROM
> (SELECT ss_ext_sales_price x, ss_item_sk
> FROM  store_sales
>  GROUP BY ss_item_sk, ss_ext_sales_price
> UNION ALL
> SELECT cs_ext_sales_price x, cs_item_sk
> FROM catalog_sales
> GROUP BY cs_item_sk, cs_ext_sales_price) tmp
> GROUP BY x
> LIMIT 10;
> Drill Results:
> +------------+
> |     x      |
> +------------+
> | 271.95     |
> | 561.20     |
> | 391.30     |
> | 1821.12    |
> | 2125.98    |
> | 1695.50    |
> | 1513.17    |
> | 411.50     |
> | 4674.60    |
> | 193.08     |
> +------------+
> 10 rows selected (9.518 seconds)
> Postgres Results:
>     x    
> ---------
>  9046.80
>  5735.31
>   568.90
>  3460.66
>  1295.23
>  6615.09
>  4693.59
>  6204.94
>  6281.80
> (10 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)