You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2020/10/01 15:21:00 UTC
[jira] [Commented] (SPARK-30186) support Dynamic Partition Pruning
in Adaptive Execution
[ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205606#comment-17205606 ]
Yuming Wang commented on SPARK-30186:
-------------------------------------
For our internal tpcds q77, enable AQE and enable DPP cannot work properly:
{code:sql}
WITH ss AS
(
SELECT s_store_sk,
Sum(ss_ext_sales_price) AS sales,
Sum(ss_net_profit) AS profit
FROM store_sales,
date_dim,
store
WHERE ss_sold_date_sk = d_date_sk
AND d_date BETWEEN Cast('2000-08-23' AS DATE) AND (
Cast('2000-08-23' AS DATE) + interval '30' day)
AND ss_store_sk = s_store_sk
GROUP BY s_store_sk), sr AS
(
SELECT s_store_sk,
sum(sr_return_amt) AS returns,
sum(sr_net_loss) AS profit_loss
FROM store_returns,
date_dim,
store
WHERE sr_returned_date_sk = d_date_sk
AND d_date BETWEEN cast('2000-08-23' AS date) AND (
cast('2000-08-23' AS date) + interval '30' day)
AND sr_store_sk = s_store_sk
GROUP BY s_store_sk), cs AS
(
SELECT cs_call_center_sk,
sum(cs_ext_sales_price) AS sales,
sum(cs_net_profit) AS profit
FROM catalog_sales,
date_dim
WHERE cs_sold_date_sk = d_date_sk
AND d_date BETWEEN cast('2000-08-23' AS date) AND (
cast('2000-08-23' AS date) + interval '30' day)
GROUP BY cs_call_center_sk), cr AS
(
SELECT cr_call_center_sk,
sum(cr_return_amount) AS returns,
sum(cr_net_loss) AS profit_loss
FROM catalog_returns,
date_dim
WHERE cr_returned_date_sk = d_date_sk
AND d_date BETWEEN cast('2000-08-23' AS date) AND (
cast('2000-08-23' AS date) + interval '30' day)
GROUP BY cr_call_center_sk), ws AS
(
SELECT wp_web_page_sk,
sum(ws_ext_sales_price) AS sales,
sum(ws_net_profit) AS profit
FROM web_sales,
date_dim,
web_page
WHERE ws_sold_date_sk = d_date_sk
AND d_date BETWEEN cast('2000-08-23' AS date) AND (
cast('2000-08-23' AS date) + interval '30' day)
AND ws_web_page_sk = wp_web_page_sk
GROUP BY wp_web_page_sk), wr AS
(
SELECT wp_web_page_sk,
sum(wr_return_amt) AS returns,
sum(wr_net_loss) AS profit_loss
FROM web_returns,
date_dim,
web_page
WHERE wr_returned_date_sk = d_date_sk
AND d_date BETWEEN cast('2000-08-23' AS date) AND (
cast('2000-08-23' AS date) + interval '30' day)
AND wr_web_page_sk = wp_web_page_sk
GROUP BY wp_web_page_sk)
SELECT channel,
id,
sum(sales) AS sales,
sum(returns) AS returns,
sum(profit) AS profit
FROM (
SELECT 'store channel' AS channel,
ss.s_store_sk AS id,
sales,
COALESCE(returns, 0) AS returns,
(profit - COALESCE(profit_loss,0)) AS profit
FROM ss
LEFT JOIN sr
ON ss.s_store_sk = sr.s_store_sk
UNION ALL
SELECT 'catalog channel' AS channel,
cs_call_center_sk AS id,
sales,
returns,
(profit - profit_loss) AS profit
FROM cs
CROSS JOIN cr
UNION ALL
SELECT 'web channel' AS channel,
ws.wp_web_page_sk AS id,
sales,
COALESCE(returns, 0) returns,
(profit - COALESCE(profit_loss,0)) AS profit
FROM ws
LEFT JOIN wr
ON ws.wp_web_page_sk = wr.wp_web_page_sk ) x
GROUP BY rollup(channel, id)
ORDER BY channel,
id limit 100
{code}
> support Dynamic Partition Pruning in Adaptive Execution
> -------------------------------------------------------
>
> Key: SPARK-30186
> URL: https://issues.apache.org/jira/browse/SPARK-30186
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Xiaoju Wu
> Priority: Major
>
> Currently Adaptive Execution cannot work if Dynamic Partition Pruning is applied.
> private def supportAdaptive(plan: SparkPlan): Boolean = {
> // TODO migrate dynamic-partition-pruning onto adaptive execution.
> sanityCheck(plan) &&
> !plan.logicalLink.exists(_.isStreaming) &&
> *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)* &&
> plan.children.forall(supportAdaptive)
> }
> It means we cannot benefit the performance from both AE and DPP.
> This ticket is target to make DPP + AE works together.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org