You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sergey (JIRA)" <ji...@apache.org> on 2017/03/10 12:45:04 UTC

[jira] [Updated] (SPARK-19900) [Standalone] Master registers application again when driver relaunched

     [ https://issues.apache.org/jira/browse/SPARK-19900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey updated SPARK-19900:
---------------------------
    Description: 
I've found some problems when node, where driver is running, has unstable network. A situation is possible when two identical applications are running on a cluster.

*Steps to Reproduce:*
# prepare 3 node. One for the spark master and two for the spark workers.
# submit an application with parameter spark.driver.supervise = true
# go to the node where driver is running (for example spark-worker-1) and close 7077 port
{code}
# iptables -A OUTPUT -p tcp --dport 7077 -j DROP
{code}
# wait more 60 seconds
# look at the spark master UI
There are two spark applications and one driver. The new application has WAITING state and the second application has RUNNING state. Driver has RUNNING or RELAUNCHING state (It depends on the resources available, as I understand it) and it launched on other node (for example spark-worker-2)
# open the port
{code}
# iptables -D OUTPUT -p tcp --dport 7077 -j DROP
{code}
# look an the spark UI again
There are no changes


In addition, if you look at the processes on the node spark-worker-1
{code}
# ps ax | grep spark
{code}
 you will see that the old driver is still working!

  was:
I've found some problems when node, where driver is running, has unstable network. A situation is possible when two identical applications are running on a cluster.

*Steps to Reproduce:*
# prepare 3 node. One for the spark master and two for the spark workers.
# submit an application with parameter spark.driver.supervise = true
# go to the node where driver is running (for example spark-worker-1) and close 7077 port
{code}
# iptables -A OUTPUT -p tcp --dport 7077 -j DROP
{code}
# wait more 60 seconds
# look at the spark master UI
There are two spark applications and one driver. The new application has WAITING state and the second application has RUNNING state. Driver has RUNNING or RELAUNCHING state (It depends on the resources available, as I understand it) and it launched on other node (for example spark-worker-2)
# open the port
{code}
# iptables -D OUTPUT -p tcp --dport 7077 -j DROP
{code}
# look an the spark UI again
There are no changes


> [Standalone] Master registers application again when driver relaunched
> ----------------------------------------------------------------------
>
>                 Key: SPARK-19900
>                 URL: https://issues.apache.org/jira/browse/SPARK-19900
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Spark Core
>    Affects Versions: 1.6.2
>         Environment: Centos 6.5, spark standalone
>            Reporter: Sergey
>              Labels: Spark, network, standalone, supervise
>
> I've found some problems when node, where driver is running, has unstable network. A situation is possible when two identical applications are running on a cluster.
> *Steps to Reproduce:*
> # prepare 3 node. One for the spark master and two for the spark workers.
> # submit an application with parameter spark.driver.supervise = true
> # go to the node where driver is running (for example spark-worker-1) and close 7077 port
> {code}
> # iptables -A OUTPUT -p tcp --dport 7077 -j DROP
> {code}
> # wait more 60 seconds
> # look at the spark master UI
> There are two spark applications and one driver. The new application has WAITING state and the second application has RUNNING state. Driver has RUNNING or RELAUNCHING state (It depends on the resources available, as I understand it) and it launched on other node (for example spark-worker-2)
> # open the port
> {code}
> # iptables -D OUTPUT -p tcp --dport 7077 -j DROP
> {code}
> # look an the spark UI again
> There are no changes
> In addition, if you look at the processes on the node spark-worker-1
> {code}
> # ps ax | grep spark
> {code}
>  you will see that the old driver is still working!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org