You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuanjian Li (Jira)" <ji...@apache.org> on 2020/03/31 09:10:00 UTC

[jira] [Created] (SPARK-31314) Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

Yuanjian Li created SPARK-31314:
-----------------------------------

             Summary: Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly
                 Key: SPARK-31314
                 URL: https://issues.apache.org/jira/browse/SPARK-31314
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Yuanjian Li


In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure.

But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark:

Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times.
Data: TPC-DS scale=99 generate by spark-tpcds-datagen

Results:
|| ||Base||Revert||
|Q20|Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606|Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463|
|Q33|Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136|Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276|
|Q52|Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871|Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108|
|Q56|Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579|Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org