You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mridul Muralidharan (JIRA)" <ji...@apache.org> on 2015/12/02 01:02:11 UTC

[jira] [Comment Edited] (SPARK-11801) Notify driver when OOM is thrown before executor JVM is killed

    [ https://issues.apache.org/jira/browse/SPARK-11801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034914#comment-15034914 ] 

Mridul Muralidharan edited comment on SPARK-11801 at 12/2/15 12:01 AM:
-----------------------------------------------------------------------

There are few aspects here :
a) A race condition between OOM being thrown vs (kill being invoked + SIGINT handler starting shutdown).
b) Whether we actually see the OOM (libraries and user code are known to swallow them for example).
c) If we do see the OOM whether we have sufficient time to do anything about it.
d) OOM can be thrown in any thread when memory is requested for - this includes executor threads, spark daemon threads, hadoop threads (dfs, yarn, etc), others. And an OOM being thrown causes kill to be executed.
e) When VM exhausts memory/when kill is executed - we enter VM lifecycle where things are slightly unstable and can have unexpected failures.


Given all these, I am very unsure about trying to handle OOM - particularly if we have code to handle it and send msg to driver, dev/users start expecting it to work making it more confusing all around - but that is a personal opinion since I am used to mucking through logs :-)

Sidenote: Threads dont need to be created - in shutdown hooks we register threads - they are just not scheduled yet (start is not invoked, but the native init, etc expensive bits are done).




was (Author: mridulm80):

There are few aspects here :
a) A race condition between OOM being thrown vs (kill being invoked + SIGINT handler starting shutdown).
b) Whether we actually see the OOM (libraries and user code are known to swallow them for example).
c) If we do see the OOM whether we have sufficient time to do anything about it.
d) OOM can be thrown in any thread when memory is requested for - this includes executor threads, spark daemon threads, hadoop threads (dfs, yarn, etc), others. And an OOM being thrown causes kill to be executed.
e) When VM exhausts memory/when kill is executed - we entire VM lifecycle where things are slightly unstable and can have unexpected failures.


Given all these, I am very unsure about trying to handle OOM - particularly if we have code to handle it and send msg to driver, dev/users start expecting it to work making it more confusing all around - but that is a personal opinion since I am used to mucking through logs :-)

Sidenote: Threads dont need to be created - in shutdown hooks we register threads - they are just not scheduled yet (start is not invoked, but the native init, etc expensive bits are done).



> Notify driver when OOM is thrown before executor JVM is killed 
> ---------------------------------------------------------------
>
>                 Key: SPARK-11801
>                 URL: https://issues.apache.org/jira/browse/SPARK-11801
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Srinivasa Reddy Vundela
>            Priority: Minor
>
> Here is some background for the issue.
> Customer got OOM exception in one of the task and executor got killed with kill %p. It is unclear in driver logs/Spark UI why the task is lost or executor is lost. Customer has to look into the executor logs to see OOM is the cause for the task/executor lost. 
> It would be helpful if driver logs/spark UI shows the reason for task failures by making sure that task updates the driver with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org