You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2019/09/18 16:07:00 UTC

[jira] [Updated] (SPARK-26713) PipedRDD may holds stdin writer and stdout read threads even if the task is finished

     [ https://issues.apache.org/jira/browse/SPARK-26713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-26713:
----------------------------------
    Fix Version/s: 2.4.5

> PipedRDD may holds stdin writer and stdout read threads even if the task is finished
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-26713
>                 URL: https://issues.apache.org/jira/browse/SPARK-26713
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.3, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Xianjin YE
>            Assignee: Xianjin YE
>            Priority: Major
>             Fix For: 2.4.5, 3.0.0
>
>
> During an investigation of OOM of one internal production job, I found that PipedRDD leaks memory. After some digging, the problem lies down to the fact that PipedRDD doesn't release stdin writer and stdout threads even if the task is finished.
>  
> PipedRDD creates two threads: stdin writer and stdout reader. If we are lucky and the task is finished normally, these two threads exit normally. If the subprocess(pipe command) is failed, the task will be marked failed, however the stdin writer will be still running until it consumes its parent RDD's iterator. There is even a race condition with ShuffledRDD + PipedRDD: the ShuffleBlockFetchIterator is cleaned up at task completion and hangs stdin writer thread, which leaks memory. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org