You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by vimal dinakaran <vi...@gmail.com> on 2016/06/28 12:52:46 UTC

Spark master shuts down when one of zookeeper dies

I am using zookeeper for providing HA for spark cluster.  We have two nodes
zookeeper cluster.

When one of the zookeeper dies then the entire spark cluster goes down .

Is this expected behaviour ?
Am I missing something in config ?

Spark version - 1.6.1.
Zookeeper version - 3.4.6
// spark-env.sh
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"

Below is the log from spark master:
ZooKeeperLeaderElectionAgent: We have lost leadership
16/06/27 09:39:30 ERROR Master: Leadership has been revoked -- master
shutting down.

Thanks
Vimal

Re: Spark master shuts down when one of zookeeper dies

Posted by Ted Yu <yu...@gmail.com>.
Looking at Master.scala, I don't see code that would bring master back up
automatically.
Probably you can implement monitoring tool so that you get some alert when
master goes down.

e.g.
http://stackoverflow.com/questions/12896998/how-to-set-up-alerts-on-ganglia

More experienced users may have better suggestion.

On Thu, Jun 30, 2016 at 2:09 AM, vimal dinakaran <vi...@gmail.com>
wrote:

> Hi Ted,
>  Thanks for the pointers. I had a three node zookeeper setup . Now the
> master alone dies when  a zookeeper instance is down and a new master is
> elected as leader and the cluster is up.
> But the master that was down , never comes up.
>
> Is this the expected ? Is there a way to get alert when a master is down ?
> How to make sure that there is atleast one back up master is up always ?
>
> Thanks
> Vimal
>
>
>
>
> On Tue, Jun 28, 2016 at 7:24 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please see some blog w.r.t. the number of nodes in the quorum:
>>
>>
>> http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes
>>
>> http://www.ibm.com/developerworks/library/bd-zookeeper/
>>   the paragraph starting with 'A quorum is represented by a strict
>> majority of nodes'
>>
>> FYI
>>
>> On Tue, Jun 28, 2016 at 5:52 AM, vimal dinakaran <vi...@gmail.com>
>> wrote:
>>
>>> I am using zookeeper for providing HA for spark cluster.  We have two
>>> nodes zookeeper cluster.
>>>
>>> When one of the zookeeper dies then the entire spark cluster goes down .
>>>
>>> Is this expected behaviour ?
>>> Am I missing something in config ?
>>>
>>> Spark version - 1.6.1.
>>> Zookeeper version - 3.4.6
>>> // spark-env.sh
>>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
>>> -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
>>>
>>> Below is the log from spark master:
>>> ZooKeeperLeaderElectionAgent: We have lost leadership
>>> 16/06/27 09:39:30 ERROR Master: Leadership has been revoked -- master
>>> shutting down.
>>>
>>> Thanks
>>> Vimal
>>>
>>>
>>>
>>>
>>
>

Re: Spark master shuts down when one of zookeeper dies

Posted by vimal dinakaran <vi...@gmail.com>.
Hi Ted,
 Thanks for the pointers. I had a three node zookeeper setup . Now the
master alone dies when  a zookeeper instance is down and a new master is
elected as leader and the cluster is up.
But the master that was down , never comes up.

Is this the expected ? Is there a way to get alert when a master is down ?
How to make sure that there is atleast one back up master is up always ?

Thanks
Vimal




On Tue, Jun 28, 2016 at 7:24 PM, Ted Yu <yu...@gmail.com> wrote:

> Please see some blog w.r.t. the number of nodes in the quorum:
>
>
> http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes
>
> http://www.ibm.com/developerworks/library/bd-zookeeper/
>   the paragraph starting with 'A quorum is represented by a strict
> majority of nodes'
>
> FYI
>
> On Tue, Jun 28, 2016 at 5:52 AM, vimal dinakaran <vi...@gmail.com>
> wrote:
>
>> I am using zookeeper for providing HA for spark cluster.  We have two
>> nodes zookeeper cluster.
>>
>> When one of the zookeeper dies then the entire spark cluster goes down .
>>
>> Is this expected behaviour ?
>> Am I missing something in config ?
>>
>> Spark version - 1.6.1.
>> Zookeeper version - 3.4.6
>> // spark-env.sh
>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
>> -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
>>
>> Below is the log from spark master:
>> ZooKeeperLeaderElectionAgent: We have lost leadership
>> 16/06/27 09:39:30 ERROR Master: Leadership has been revoked -- master
>> shutting down.
>>
>> Thanks
>> Vimal
>>
>>
>>
>>
>

Re: Spark master shuts down when one of zookeeper dies

Posted by Ted Yu <yu...@gmail.com>.
Please see some blog w.r.t. the number of nodes in the quorum:

http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes

http://www.ibm.com/developerworks/library/bd-zookeeper/
  the paragraph starting with 'A quorum is represented by a strict majority
of nodes'

FYI

On Tue, Jun 28, 2016 at 5:52 AM, vimal dinakaran <vi...@gmail.com>
wrote:

> I am using zookeeper for providing HA for spark cluster.  We have two
> nodes zookeeper cluster.
>
> When one of the zookeeper dies then the entire spark cluster goes down .
>
> Is this expected behaviour ?
> Am I missing something in config ?
>
> Spark version - 1.6.1.
> Zookeeper version - 3.4.6
> // spark-env.sh
> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
> -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
>
> Below is the log from spark master:
> ZooKeeperLeaderElectionAgent: We have lost leadership
> 16/06/27 09:39:30 ERROR Master: Leadership has been revoked -- master
> shutting down.
>
> Thanks
> Vimal
>
>
>
>