You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ke Jia (JIRA)" <ji...@apache.org> on 2018/11/23 07:49:00 UTC
[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

    [ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696476#comment-16696476 ] 

Ke Jia commented on SPARK-26155:
--------------------------------

*Cluster info:*
| |*Master Node*|*Worker Nodes* |
|*Node*|1x |7x|
|*Processor*|Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz|Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz|
|*Memory*|192 GB|384 GB|
|*Storage Main*|8 x 960G SSD|8 x 960G SSD|
|*Network*|10Gbe|
|*Role*|CM Management 
 NameNode
 Secondary NameNode
 Resource Manager
 Hive Metastore Server|DataNode
 NodeManager|
|*OS Version*|CentOS 7.2|
|*Hadoop*|Apache Hadoop 2.7.5|
|*Hive*|Apache Hive 2.2.0|
|*Spark*|Apache Spark 2.1.0  VS Apache Spark2.3.0|
|*JDK  version*|1.8.0_112|

*Related parameters setting:*
|*Component*|*Parameter*|*Value*|
|*Yarn Resource Manager*|yarn.scheduler.maximum-allocation-mb|40GB|
|yarn.scheduler.minimum-allocation-mb|1GB|
|yarn.scheduler.maximum-allocation-vcores|121|
|Yarn.resourcemanager.scheduler.class|Fair Scheduler|
|*Yarn Node Manager*|yarn.nodemanager.resource.memory-mb|40GB|
|yarn.nodemanager.resource.cpu-vcores|121|
|*Spark*|spark.executor.memory|34GB|
|spark.executor.cores|40|

> Spark SQL  performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26155
>                 URL: https://issues.apache.org/jira/browse/SPARK-26155
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Ke Jia
>            Priority: Major
>
> In our test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] and [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487]  . Q19 costs about 30 seconds without these two lines code and 126 seconds with these code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org