You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2020/10/01 15:21:00 UTC

[jira] [Commented] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution

    [ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205606#comment-17205606 ] 

Yuming Wang commented on SPARK-30186:
-------------------------------------

For our internal tpcds q77, enable AQE and enable DPP cannot work properly:
{code:sql}
WITH ss AS
(
         SELECT   s_store_sk,
                  Sum(ss_ext_sales_price) AS sales,
                  Sum(ss_net_profit)      AS profit
         FROM     store_sales,
                  date_dim,
                  store
         WHERE    ss_sold_date_sk = d_date_sk
         AND      d_date BETWEEN Cast('2000-08-23' AS DATE) AND      (
                           Cast('2000-08-23' AS DATE) + interval '30' day)
         AND      ss_store_sk = s_store_sk
         GROUP BY s_store_sk), sr AS
(
         SELECT   s_store_sk,
                  sum(sr_return_amt) AS returns,
                  sum(sr_net_loss)   AS profit_loss
         FROM     store_returns,
                  date_dim,
                  store
         WHERE    sr_returned_date_sk = d_date_sk
         AND      d_date BETWEEN cast('2000-08-23' AS date) AND      (
                           cast('2000-08-23' AS date) + interval '30' day)
         AND      sr_store_sk = s_store_sk
         GROUP BY s_store_sk), cs AS
(
         SELECT   cs_call_center_sk,
                  sum(cs_ext_sales_price) AS sales,
                  sum(cs_net_profit)      AS profit
         FROM     catalog_sales,
                  date_dim
         WHERE    cs_sold_date_sk = d_date_sk
         AND      d_date BETWEEN cast('2000-08-23' AS date) AND      (
                           cast('2000-08-23' AS date) + interval '30' day)
         GROUP BY cs_call_center_sk), cr AS
(
         SELECT   cr_call_center_sk,
                  sum(cr_return_amount) AS returns,
                  sum(cr_net_loss)      AS profit_loss
         FROM     catalog_returns,
                  date_dim
         WHERE    cr_returned_date_sk = d_date_sk
         AND      d_date BETWEEN cast('2000-08-23' AS date) AND      (
                           cast('2000-08-23' AS date) + interval '30' day)
         GROUP BY cr_call_center_sk), ws AS
(
         SELECT   wp_web_page_sk,
                  sum(ws_ext_sales_price) AS sales,
                  sum(ws_net_profit)      AS profit
         FROM     web_sales,
                  date_dim,
                  web_page
         WHERE    ws_sold_date_sk = d_date_sk
         AND      d_date BETWEEN cast('2000-08-23' AS date) AND      (
                           cast('2000-08-23' AS date) + interval '30' day)
         AND      ws_web_page_sk = wp_web_page_sk
         GROUP BY wp_web_page_sk), wr AS
(
         SELECT   wp_web_page_sk,
                  sum(wr_return_amt) AS returns,
                  sum(wr_net_loss)   AS profit_loss
         FROM     web_returns,
                  date_dim,
                  web_page
         WHERE    wr_returned_date_sk = d_date_sk
         AND      d_date BETWEEN cast('2000-08-23' AS date) AND      (
                           cast('2000-08-23' AS date) + interval '30' day)
         AND      wr_web_page_sk = wp_web_page_sk
         GROUP BY wp_web_page_sk)
SELECT   channel,
         id,
         sum(sales)   AS sales,
         sum(returns) AS returns,
         sum(profit)  AS profit
FROM     (
                   SELECT    'store channel' AS channel,
                             ss.s_store_sk   AS id,
                             sales,
                             COALESCE(returns, 0)               AS returns,
                             (profit - COALESCE(profit_loss,0)) AS profit
                   FROM      ss
                   LEFT JOIN sr
                   ON        ss.s_store_sk = sr.s_store_sk
                   UNION ALL
                   SELECT     'catalog channel' AS channel,
                              cs_call_center_sk AS id,
                              sales,
                              returns,
                              (profit - profit_loss) AS profit
                   FROM       cs
                   CROSS JOIN cr
                   UNION ALL
                   SELECT    'web channel'     AS channel,
                             ws.wp_web_page_sk AS id,
                             sales,
                             COALESCE(returns, 0)                  returns,
                             (profit - COALESCE(profit_loss,0)) AS profit
                   FROM      ws
                   LEFT JOIN wr
                   ON        ws.wp_web_page_sk = wr.wp_web_page_sk ) x
GROUP BY rollup(channel, id)
ORDER BY channel,
         id limit 100
{code}


> support Dynamic Partition Pruning in Adaptive Execution
> -------------------------------------------------------
>
>                 Key: SPARK-30186
>                 URL: https://issues.apache.org/jira/browse/SPARK-30186
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Xiaoju Wu
>            Priority: Major
>
> Currently Adaptive Execution cannot work if Dynamic Partition Pruning is applied.
> private def supportAdaptive(plan: SparkPlan): Boolean = {
>  // TODO migrate dynamic-partition-pruning onto adaptive execution.
>  sanityCheck(plan) &&
>  !plan.logicalLink.exists(_.isStreaming) &&
>  *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)* &&
>  plan.children.forall(supportAdaptive)
> }
> It means we cannot benefit the performance from both AE and DPP.
> This ticket is target to make DPP + AE works together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org