You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "HongJin (Jira)" <ji...@apache.org> on 2020/03/26 02:40:00 UTC

[jira] [Created] (SPARK-31260) How to speed up WholeStageCodegen in Spark SQL Query?

HongJin created SPARK-31260:
-------------------------------

             Summary: How to speed up WholeStageCodegen in Spark SQL Query?
                 Key: SPARK-31260
                 URL: https://issues.apache.org/jira/browse/SPARK-31260
             Project: Spark
          Issue Type: Question
          Components: Spark Core
    Affects Versions: 2.4.4
            Reporter: HongJin


It's took about 2mins for one 248 MB file. 2 files ~ 5 mins How can I tune or maximize the performance.

Initialize spark as below:

{{.setMaster(numCores)
.set("spark.driver.host", "localhost")
.set("spark.executor.cores","2")
.set("spark.num.executors","2")
.set("spark.executor.memory", "4g")
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.dynamicAllocation.minExecutors","2")
.set("spark.dynamicAllocation.maxExecutors","2")
.set("spark.ui.enabled","true")
.set("spark.sql.shuffle.partitions",defaultPartitions)}}

{{}}

{{joinedDf = upperCaseLeft.as("l")
          .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
          .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*)}}

{{}}

{{}}

{{}}

{{data = joinedDf.take(1000)}}

{{}}

[https://i.stack.imgur.com/oeYww.png]{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org