You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Gautam Parai (JIRA)" <ji...@apache.org> on 2019/05/07 18:23:00 UTC
[jira] [Updated] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run
with Statistics enabled at sf100
[ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gautam Parai updated DRILL-7227:
--------------------------------
Labels: ready-to-commit (was: )
> TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
> ---------------------------------------------------------------------
>
> Key: DRILL-7227
> URL: https://issues.apache.org/jira/browse/DRILL-7227
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata
> Affects Versions: 1.16.0
> Reporter: Robert Hou
> Assignee: Gautam Parai
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, 2338ae93-155b-356d-382e-0da949c6f439.sys.drill
>
>
> Here is query 78:
> {noformat}
> WITH ws
> AS (SELECT d_year AS ws_sold_year,
> ws_item_sk,
> ws_bill_customer_sk ws_customer_sk,
> Sum(ws_quantity) ws_qty,
> Sum(ws_wholesale_cost) ws_wc,
> Sum(ws_sales_price) ws_sp
> FROM web_sales
> LEFT JOIN web_returns
> ON wr_order_number = ws_order_number
> AND ws_item_sk = wr_item_sk
> JOIN date_dim
> ON ws_sold_date_sk = d_date_sk
> WHERE wr_order_number IS NULL
> GROUP BY d_year,
> ws_item_sk,
> ws_bill_customer_sk),
> cs
> AS (SELECT d_year AS cs_sold_year,
> cs_item_sk,
> cs_bill_customer_sk cs_customer_sk,
> Sum(cs_quantity) cs_qty,
> Sum(cs_wholesale_cost) cs_wc,
> Sum(cs_sales_price) cs_sp
> FROM catalog_sales
> LEFT JOIN catalog_returns
> ON cr_order_number = cs_order_number
> AND cs_item_sk = cr_item_sk
> JOIN date_dim
> ON cs_sold_date_sk = d_date_sk
> WHERE cr_order_number IS NULL
> GROUP BY d_year,
> cs_item_sk,
> cs_bill_customer_sk),
> ss
> AS (SELECT d_year AS ss_sold_year,
> ss_item_sk,
> ss_customer_sk,
> Sum(ss_quantity) ss_qty,
> Sum(ss_wholesale_cost) ss_wc,
> Sum(ss_sales_price) ss_sp
> FROM store_sales
> LEFT JOIN store_returns
> ON sr_ticket_number = ss_ticket_number
> AND ss_item_sk = sr_item_sk
> JOIN date_dim
> ON ss_sold_date_sk = d_date_sk
> WHERE sr_ticket_number IS NULL
> GROUP BY d_year,
> ss_item_sk,
> ss_customer_sk)
> SELECT ss_item_sk,
> Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio,
> ss_qty store_qty,
> ss_wc
> store_wholesale_cost,
> ss_sp
> store_sales_price,
> COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0)
> other_chan_qty,
> COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0)
> other_chan_wholesale_cost,
> COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0)
> other_chan_sales_price
> FROM ss
> LEFT JOIN ws
> ON ( ws_sold_year = ss_sold_year
> AND ws_item_sk = ss_item_sk
> AND ws_customer_sk = ss_customer_sk )
> LEFT JOIN cs
> ON ( cs_sold_year = ss_sold_year
> AND cs_item_sk = cs_item_sk
> AND cs_customer_sk = ss_customer_sk )
> WHERE COALESCE(ws_qty, 0) > 0
> AND COALESCE(cs_qty, 0) > 0
> AND ss_sold_year = 1999
> ORDER BY ss_item_sk,
> ss_qty DESC,
> ss_wc DESC,
> ss_sp DESC,
> other_chan_qty,
> other_chan_wholesale_cost,
> other_chan_sales_price,
> Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2)
> LIMIT 100;
> {noformat}
> The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439. Hash partition sender operator (10-00) takes 10-15 minutes. I am not sure why it takes so long. It has 10 minor fragments sending to receiver (06-05), which has 62 minor fragments. But hash partition sender (16-00) has 10 minor fragments sending to receiver (12-06), which has 220 minor fragments, and there is no performance issue.
> The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1. Both plans use the same commit. The old plan is created by disabling statistics.
> I have not included the plans in the Jira because Jira has a max of 32K.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)