You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Qi Zhu (Jira)" <ji...@apache.org> on 2021/07/28 07:05:00 UTC
[jira] [Commented] (SPARK-31314) Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

    [ https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388519#comment-17388519 ] 

Qi Zhu commented on SPARK-31314:
--------------------------------

cc [~XuanYuan] [~cloud_fan]

Since this has been reverted, i meet the disk failure in our production clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just retry the task, if we can avoid retry to the same failed disk in one node? Or if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually create shuffle files, it caused overhead, if we can get a workaround solution to avoid create when tasks don't need temp shuffle files, i still think we should handle this.

The logs are: 
{code:java}
DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, ********** 91): java.io.FileNotFoundException: /data2/yarn/local/usercache/aa/appcache/*****/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026 (Input/output error)
 at java.io.FileOutputStream.open0(Native Method)
 at java.io.FileOutputStream.open(FileOutputStream.java:270)
 at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
 at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
 at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
 at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
 at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
 at org.apache.spark.scheduler.Task.run(Task.scala:121)
 at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{code}
Thanks.

 

 

> Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-31314
>                 URL: https://issues.apache.org/jira/browse/SPARK-31314
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Yuanjian Li
>            Assignee: Yuanjian Li
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure.
> But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark:
> Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times.
> Data: TPC-DS scale=99 generate by spark-tpcds-datagen
> Results:
> || ||Base||Revert||
> |Q20|Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606|Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463|
> |Q33|Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136|Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276|
> |Q52|Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871|Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108|
> |Q56|Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579|Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org