You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saisai Shao (Jira)" <ji...@apache.org> on 2019/08/22 07:02:00 UTC
[jira] [Created] (SPARK-28849) Spark's UnsafeShuffleWriter may run
into infinite loop in transferTo occasionally
Saisai Shao created SPARK-28849:
-----------------------------------
Summary: Spark's UnsafeShuffleWriter may run into infinite loop in transferTo occasionally
Key: SPARK-28849
URL: https://issues.apache.org/jira/browse/SPARK-28849
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.3.1
Reporter: Saisai Shao
Spark's {{UnsafeShuffleWriter}} may run into infinite loop when calling {{transferTo}} occasionally. What we saw is that when merging shuffle temp file, the task is hung for several hours until killed manually. Here's the log you can see, there's no any log after spill the shuffle files to disk for several hours.
And here is the thread dump, we could see that it is calling native method {{size0}}.
And we use strace to trace the system, we found that this thread is always calling {{fstat}}, here is the screenshot.
We didn't find the root cause here, I guess it might be related to FS or disk issue. Anyway we should figure out a way to fail fast in a such scenario.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org