You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2022/04/11 07:17:00 UTC
[jira] [Commented] (SPARK-38853) optimizeSkewsInRebalancePartitions has performance issue
[ https://issues.apache.org/jira/browse/SPARK-38853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520345#comment-17520345 ]
Yuming Wang commented on SPARK-38853:
-------------------------------------
Config:
{noformat}
spark.master yarn
spark.driver.maxResultSize 4g
spark.driver.memory 20g
spark.executor.cores 5
spark.executor.instances 200
spark.executor.memory 15g
spark.sql.adaptive.coalescePartitions.initialPartitionNum 10000
spark.sql.adaptive.coalescePartitions.minPartitionNum 200
spark.sql.adaptive.advisoryPartitionSizeInBytes 100m
{noformat}
> optimizeSkewsInRebalancePartitions has performance issue
> --------------------------------------------------------
>
> Key: SPARK-38853
> URL: https://issues.apache.org/jira/browse/SPARK-38853
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Yuming Wang
> Priority: Major
> Attachments: Disable.png, enable.png
>
>
> How to reproduce this issue:
> {code:sql}
> CREATE TABLE t USING PARQUET
> AS
> SELECT
> /*+ REBALANCE */
> A.SESSION_START_DT
> , COALESCE(A.SITE_ID,0) AS SITE_ID
> , A.GUID
> , COALESCE(CAST(A.SESSION_SKEY AS BIGINT),0) AS SESSION_SKEY
> , COALESCE(CAST(A.SEQNUM AS INT),0) AS SEQNUM
>
> , COALESCE(A.IMP_PAGE_ID,0) AS IMP_PAGE_ID
> , COALESCE(A.PLACEMENT_ID,0) AS PLACEMENT_ID
> , A.PRODUCT_LINE_CODE
> , A.ALGORITHM_ID
> , A.MEID
> , A.ALGO_OUTPUT_ITEMS
> , A.CLICKS
> , A.GMV_7D
> FROM big_partition_table A
> WHERE
> DT BETWEEN DATE_FORMAT(DATE_SUB(CURRENT_DATE,11), 'yyyyMMdd') AND DATE_FORMAT(DATE_ADD(DATE_SUB(CURRENT_DATE,11),0), 'yyyyMMdd')
> AND TO_DATE(from_unixtime(unix_timestamp(A.SESSION_START_DT, 'yyyy/MM/dd'))) = DATE_SUB(CURRENT_DATE,11)
> AND ICFBOT = '00';
> {code}
> Enabling optimizeSkewsInRebalancePartitions takes more than 2 hours and the driver hangs:
> !enable.png!
> Disabling optimizeSkewsInRebalancePartitions takes only 29 minutes:
> !Disable.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org