You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2018/02/02 02:20:00 UTC

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

    [ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349679#comment-16349679 ] 

Yin Huai commented on SPARK-23310:
----------------------------------

[~sitalkedia@gmail.com] We found that the commit for SPARK-21113 introduced a noticeable regression. Because Q95 is a join heavy join, which represents one set of common workloads, I am concerned that this regression is quite easy to hit by users of Spark 2.3. Considering that setting spark.unsafe.sorter.spill.read.ahead.enabled to false improves the overall performance of all TPC-DS queries, how about we set spark.unsafe.sorter.spill.read.ahead.enabled to false by default in Spark 2.3? Then, we can look into how to resolve this regression for Spark 2.4. What do you think?

(Feel free to enable it for your workloads because they will definitely help Spark to improve this part :) )

> Perf regression introduced by SPARK-21113
> -----------------------------------------
>
>                 Key: SPARK-23310
>                 URL: https://issues.apache.org/jira/browse/SPARK-23310
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>            Reporter: Yin Huai
>            Priority: Blocker
>
> While running all TPC-DS queries with SF set to 1000, we noticed that Q95 (https://github.com/databricks/spark-sql-perf/blob/master/src/main/resources/tpcds_2_4/q95.sql) has noticeable regression (11%). After looking into it, we found that the regression was introduced by SPARK-21113. Specially, ReadAheadInputStream gets lock congestion. After setting spark.unsafe.sorter.spill.read.ahead.enabled set to false, the regression disappear and the overall performance of all TPC-DS queries has improved.
>  
> I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to false by default for Spark 2.3 and re-enable it after addressing the lock congestion issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org