You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ke Jia (JIRA)" <ji...@apache.org> on 2018/12/09 03:35:00 UTC

[jira] [Updated] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

     [ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ke Jia updated SPARK-26155:
---------------------------
    Attachment: tpcds.result.xlsx

> Spark SQL  performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26155
>                 URL: https://issues.apache.org/jira/browse/SPARK-26155
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Ke Jia
>            Priority: Major
>         Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis in Spark2.3 without L486&487.pdf, q19.sql, tpcds.result.xlsx
>
>
> In our test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] and [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487]  . Q19 costs about 30 seconds without these two lines code and 126 seconds with these code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org