You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Gautam Parai (JIRA)" <ji...@apache.org> on 2019/05/07 18:23:00 UTC

[jira] [Updated] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100

     [ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gautam Parai updated DRILL-7227:
--------------------------------
    Labels: ready-to-commit  (was: )

> TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
> ---------------------------------------------------------------------
>
>                 Key: DRILL-7227
>                 URL: https://issues.apache.org/jira/browse/DRILL-7227
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.16.0
>            Reporter: Robert Hou
>            Assignee: Gautam Parai
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.17.0
>
>         Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, 2338ae93-155b-356d-382e-0da949c6f439.sys.drill
>
>
> Here is query 78:
> {noformat}
> WITH ws 
>      AS (SELECT d_year                 AS ws_sold_year, 
>                 ws_item_sk, 
>                 ws_bill_customer_sk    ws_customer_sk, 
>                 Sum(ws_quantity)       ws_qty, 
>                 Sum(ws_wholesale_cost) ws_wc, 
>                 Sum(ws_sales_price)    ws_sp 
>          FROM   web_sales 
>                 LEFT JOIN web_returns 
>                        ON wr_order_number = ws_order_number 
>                           AND ws_item_sk = wr_item_sk 
>                 JOIN date_dim 
>                   ON ws_sold_date_sk = d_date_sk 
>          WHERE  wr_order_number IS NULL 
>          GROUP  BY d_year, 
>                    ws_item_sk, 
>                    ws_bill_customer_sk), 
>      cs 
>      AS (SELECT d_year                 AS cs_sold_year, 
>                 cs_item_sk, 
>                 cs_bill_customer_sk    cs_customer_sk, 
>                 Sum(cs_quantity)       cs_qty, 
>                 Sum(cs_wholesale_cost) cs_wc, 
>                 Sum(cs_sales_price)    cs_sp 
>          FROM   catalog_sales 
>                 LEFT JOIN catalog_returns 
>                        ON cr_order_number = cs_order_number 
>                           AND cs_item_sk = cr_item_sk 
>                 JOIN date_dim 
>                   ON cs_sold_date_sk = d_date_sk 
>          WHERE  cr_order_number IS NULL 
>          GROUP  BY d_year, 
>                    cs_item_sk, 
>                    cs_bill_customer_sk), 
>      ss 
>      AS (SELECT d_year                 AS ss_sold_year, 
>                 ss_item_sk, 
>                 ss_customer_sk, 
>                 Sum(ss_quantity)       ss_qty, 
>                 Sum(ss_wholesale_cost) ss_wc, 
>                 Sum(ss_sales_price)    ss_sp 
>          FROM   store_sales 
>                 LEFT JOIN store_returns 
>                        ON sr_ticket_number = ss_ticket_number 
>                           AND ss_item_sk = sr_item_sk 
>                 JOIN date_dim 
>                   ON ss_sold_date_sk = d_date_sk 
>          WHERE  sr_ticket_number IS NULL 
>          GROUP  BY d_year, 
>                    ss_item_sk, 
>                    ss_customer_sk) 
> SELECT ss_item_sk, 
>                Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, 
>                ss_qty                                              store_qty, 
>                ss_wc 
>                store_wholesale_cost, 
>                ss_sp 
>                store_sales_price, 
>                COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) 
>                other_chan_qty, 
>                COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) 
>                other_chan_wholesale_cost, 
>                COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) 
>                other_chan_sales_price 
> FROM   ss 
>        LEFT JOIN ws 
>               ON ( ws_sold_year = ss_sold_year 
>                    AND ws_item_sk = ss_item_sk 
>                    AND ws_customer_sk = ss_customer_sk ) 
>        LEFT JOIN cs 
>               ON ( cs_sold_year = ss_sold_year 
>                    AND cs_item_sk = cs_item_sk 
>                    AND cs_customer_sk = ss_customer_sk ) 
> WHERE  COALESCE(ws_qty, 0) > 0 
>        AND COALESCE(cs_qty, 0) > 0 
>        AND ss_sold_year = 1999 
> ORDER  BY ss_item_sk, 
>           ss_qty DESC, 
>           ss_wc DESC, 
>           ss_sp DESC, 
>           other_chan_qty, 
>           other_chan_wholesale_cost, 
>           other_chan_sales_price, 
>           Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2)
> LIMIT 100; 
> {noformat}
> The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439.  Hash partition sender operator (10-00) takes 10-15 minutes.  I am not sure why it takes so long.  It has 10 minor fragments sending to receiver (06-05), which has 62 minor fragments.  But hash partition sender (16-00) has 10 minor fragments sending to receiver (12-06), which has 220 minor fragments, and there is no performance issue.
> The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.  Both plans use the same commit.  The old plan is created by disabling statistics.
> I have not included the plans in the Jira because Jira has a max of 32K.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)