You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Kent Yao (Jira)" <ji...@apache.org> on 2019/09/26 08:11:00 UTC

[jira] [Created] (SPARK-29257) All Task attempts scheduled to the same executor inevitably access the same bad disk

Kent Yao created SPARK-29257:
--------------------------------

Summary: All Task attempts scheduled to the same executor inevitably access the same bad disk
Key: SPARK-29257
URL: https://issues.apache.org/jira/browse/SPARK-29257
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 2.4.4, 2.3.4
Reporter: Kent Yao

We have an HDFS/YARN cluster with about 2k~3k nodes, each node has 8T * 12 local disks for storage and shuffle. Sometimes, one or more disks get into bad status during computations. Sometimes it does cause job level failure, sometimes does.

The following picture shows one failure job caused by 4 task attempts were all delivered to the same node and failed with almost the same exception for writing the index temporary file to the same bad disk.

This is caused by two reasons:
# As we can see in the figure the data and the node have the best data locality for reading from HDFS. As the default spark.locality.wait(3s) taking effect, there is a high probability that those attempts will be scheduled to this node.
# The index file or data file name for a particular shuffle map task is fixed. It is formed by the shuffle id, the map id and the noop reduce id which is always 0. The root local dir is picked by the fixed file name's non-negative hash code % the disk number. Thus, this value is also fixed. Even when we have 12 disks in total and only one of them is broken, if the broken one is once picked, all the following attempts of this task will inevitably pick the broken one.

!image-2019-09-26-15-35-29-342.png!

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org