You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Suraj Sharma (Jira)" <ji...@apache.org> on 2020/06/16 19:41:00 UTC

[jira] [Created] (SPARK-32007) Spark Driver Supervise does not work reliably

Suraj Sharma created SPARK-32007:
------------------------------------

             Summary: Spark Driver Supervise does not work reliably
                 Key: SPARK-32007
                 URL: https://issues.apache.org/jira/browse/SPARK-32007
             Project: Spark
          Issue Type: Question
          Components: Spark Core
    Affects Versions: 2.4.4
         Environment: ||Name||Value||
|Java Version|1.8.0_121 (Oracle Corporation)|
|Java Home|/usr/java/jdk1.8.0_121/jre|
|Scala Version|version 2.11.12|
|OS|Amazon Linux|
h4.  
            Reporter: Suraj Sharma


I have a standalone cluster setup. I DO NOT have a streaming use case. I use AWS EC2 machines to have spark master and worker processes.

*Problem*: If a spark worker machine running some drivers and executor dies, then the driver is not spawned again on other healthy machines.

*Below are my findings:*
||Action/Behaviour||Executor||Driver||
|Worker Machine Stop|Relaunches on an active machine|NO Relaunch|
|kill -9 to process|Relaunches on other machines|Relaunches on other machines|
|kill to process|Relaunches on other machines|Relaunches on other machines|

*Cluster Setup:*
 # I have a spark standalone cluster
 # {{spark.driver.supervise=true}}
 # Spark Master HA is enabled and is backed by zookeeper
 # Spark version = 2.4.4
 # I am using a systemd script for the spark worker process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org