You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "StanZhai (JIRA)" <ji...@apache.org> on 2017/02/13 08:07:41 UTC
[jira] [Updated] (SPARK-19532) [Core]`DataStreamer for file`
threads of DFSOutputStream leak if set `spark.speculation` to true
[ https://issues.apache.org/jira/browse/SPARK-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
StanZhai updated SPARK-19532:
-----------------------------
Description:
When set `spark.speculation` to true, from thread dump page of Executor of WebUI, I found that there are about 1300 threads named "DataStreamer for file /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_000069_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet" in TIMED_WAITING state.
{code}
java.lang.Object.wait(Native Method)
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564)
{code}
The off-heap memory exceeds a lot until Executor exited with OOM exception.
This problem occurs only when writing data to the Hadoop(tasks may be killed by Executor during writing).
Could this be related to [https://issues.apache.org/jira/browse/HDFS-9812]?
The version of Hadoop is 2.6.4.
was:
When set `spark.speculation` to true, from thread dump page of Executor of WebUI, I found that there are about 1300 threads named "DataStreamer for file /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_000069_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet" in TIMED_WAITING state.
{code}
java.lang.Object.wait(Native Method)
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564)
{code}
The off-heap memory exceeds a lot until Executor exited with OOM exception.
This problem occurs only when writing data to the Hadoop(tasks may be killed by Executor during writing).
> [Core]`DataStreamer for file` threads of DFSOutputStream leak if set `spark.speculation` to true
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-19532
> URL: https://issues.apache.org/jira/browse/SPARK-19532
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.1.0
> Reporter: StanZhai
> Priority: Blocker
>
> When set `spark.speculation` to true, from thread dump page of Executor of WebUI, I found that there are about 1300 threads named "DataStreamer for file /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_000069_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet" in TIMED_WAITING state.
> {code}
> java.lang.Object.wait(Native Method)
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564)
> {code}
> The off-heap memory exceeds a lot until Executor exited with OOM exception.
> This problem occurs only when writing data to the Hadoop(tasks may be killed by Executor during writing).
> Could this be related to [https://issues.apache.org/jira/browse/HDFS-9812]?
> The version of Hadoop is 2.6.4.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org