You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2022/04/11 07:08:00 UTC

[jira] [Updated] (SPARK-38853) optimizeSkewsInRebalancePartitions has performance issue

     [ https://issues.apache.org/jira/browse/SPARK-38853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuming Wang updated SPARK-38853:
--------------------------------
    Description: 
How to reproduce this issue:
{code:sql}
CREATE TABLE t USING PARQUET
AS
SELECT
    /*+ REBALANCE */
	A.SESSION_START_DT
	, COALESCE(A.SITE_ID,0) AS SITE_ID
	, A.GUID
	, COALESCE(CAST(A.SESSION_SKEY AS BIGINT),0) AS SESSION_SKEY
	, COALESCE(CAST(A.SEQNUM AS INT),0) AS SEQNUM
	
	, COALESCE(A.IMP_PAGE_ID,0) AS IMP_PAGE_ID
	, COALESCE(A.PLACEMENT_ID,0) AS PLACEMENT_ID
	, A.PRODUCT_LINE_CODE
	, A.ALGORITHM_ID
	, A.MEID
	, A.ALGO_OUTPUT_ITEMS
	, A.CLICKS
	, A.GMV_7D
FROM big_partition_table A
WHERE
	DT BETWEEN DATE_FORMAT(DATE_SUB(CURRENT_DATE,11), 'yyyyMMdd') AND DATE_FORMAT(DATE_ADD(DATE_SUB(CURRENT_DATE,11),0), 'yyyyMMdd')
	AND TO_DATE(from_unixtime(unix_timestamp(A.SESSION_START_DT, 'yyyy/MM/dd'))) = DATE_SUB(CURRENT_DATE,11)
	AND ICFBOT = '00';
{code}

Enable :



> optimizeSkewsInRebalancePartitions has performance issue
> --------------------------------------------------------
>
>                 Key: SPARK-38853
>                 URL: https://issues.apache.org/jira/browse/SPARK-38853
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> CREATE TABLE t USING PARQUET
> AS
> SELECT
>     /*+ REBALANCE */
> 	A.SESSION_START_DT
> 	, COALESCE(A.SITE_ID,0) AS SITE_ID
> 	, A.GUID
> 	, COALESCE(CAST(A.SESSION_SKEY AS BIGINT),0) AS SESSION_SKEY
> 	, COALESCE(CAST(A.SEQNUM AS INT),0) AS SEQNUM
> 	
> 	, COALESCE(A.IMP_PAGE_ID,0) AS IMP_PAGE_ID
> 	, COALESCE(A.PLACEMENT_ID,0) AS PLACEMENT_ID
> 	, A.PRODUCT_LINE_CODE
> 	, A.ALGORITHM_ID
> 	, A.MEID
> 	, A.ALGO_OUTPUT_ITEMS
> 	, A.CLICKS
> 	, A.GMV_7D
> FROM big_partition_table A
> WHERE
> 	DT BETWEEN DATE_FORMAT(DATE_SUB(CURRENT_DATE,11), 'yyyyMMdd') AND DATE_FORMAT(DATE_ADD(DATE_SUB(CURRENT_DATE,11),0), 'yyyyMMdd')
> 	AND TO_DATE(from_unixtime(unix_timestamp(A.SESSION_START_DT, 'yyyy/MM/dd'))) = DATE_SUB(CURRENT_DATE,11)
> 	AND ICFBOT = '00';
> {code}
> Enable :



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org