You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (Jira)" <ji...@apache.org> on 2020/03/31 13:03:00 UTC

[jira] [Updated] (SPARK-31314) Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

     [ https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Graves updated SPARK-31314:
----------------------------------
    Issue Type: Bug  (was: Improvement)

> Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-31314
>                 URL: https://issues.apache.org/jira/browse/SPARK-31314
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Yuanjian Li
>            Assignee: Yuanjian Li
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure.
> But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark:
> Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times.
> Data: TPC-DS scale=99 generate by spark-tpcds-datagen
> Results:
> || ||Base||Revert||
> |Q20|Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606|Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463|
> |Q33|Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136|Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276|
> |Q52|Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871|Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108|
> |Q56|Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579|Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org