You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Suraj Sharma (Jira)" <ji...@apache.org> on 2020/06/16 19:41:00 UTC
[jira] [Created] (SPARK-32007) Spark Driver Supervise does not work
reliably
Suraj Sharma created SPARK-32007:
------------------------------------
Summary: Spark Driver Supervise does not work reliably
Key: SPARK-32007
URL: https://issues.apache.org/jira/browse/SPARK-32007
Project: Spark
Issue Type: Question
Components: Spark Core
Affects Versions: 2.4.4
Environment: ||Name||Value||
|Java Version|1.8.0_121 (Oracle Corporation)|
|Java Home|/usr/java/jdk1.8.0_121/jre|
|Scala Version|version 2.11.12|
|OS|Amazon Linux|
h4.
Reporter: Suraj Sharma
I have a standalone cluster setup. I DO NOT have a streaming use case. I use AWS EC2 machines to have spark master and worker processes.
*Problem*: If a spark worker machine running some drivers and executor dies, then the driver is not spawned again on other healthy machines.
*Below are my findings:*
||Action/Behaviour||Executor||Driver||
|Worker Machine Stop|Relaunches on an active machine|NO Relaunch|
|kill -9 to process|Relaunches on other machines|Relaunches on other machines|
|kill to process|Relaunches on other machines|Relaunches on other machines|
*Cluster Setup:*
# I have a spark standalone cluster
# {{spark.driver.supervise=true}}
# Spark Master HA is enabled and is backed by zookeeper
# Spark version = 2.4.4
# I am using a systemd script for the spark worker process
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org