You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhang, Liye (JIRA)" <ji...@apache.org> on 2014/12/29 09:20:13 UTC

[jira] [Created] (SPARK-4991) Worker should reconnect to Master when Master actor restart

Zhang, Liye created SPARK-4991:
----------------------------------

             Summary: Worker should reconnect to Master when Master actor restart
                 Key: SPARK-4991
                 URL: https://issues.apache.org/jira/browse/SPARK-4991
             Project: Spark
          Issue Type: Improvement
          Components: Deploy, Spark Core
    Affects Versions: 1.2.0, 1.1.0, 1.0.0
            Reporter: Zhang, Liye


This is a following JIRA of [SPARK-4989|https://issues.apache.org/jira/browse/SPARK-4989]. when Master akka actor encounter an exception, the Master will restart (akka actor restart not JVM restart). And all old information are cleared on Master (including workers, applications, etc). However, the workers are not aware of this at all. The state of the cluster is that: the master is on, and all workers are also on, but master is not aware of the exists of workers, and will ignore all worker's heartbeat because all workers are not registered. So that the whole cluster is not available.

For some other information about this part, please refer to [SPARK-3736|https://issues.apache.org/jira/browse/SPARK-3736] and [SPARK-4592|https://issues.apache.org/jira/browse/SPARK-4592]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org