You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zuotingbing (JIRA)" <ji...@apache.org> on 2018/04/17 07:44:00 UTC
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node
causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440535#comment-16440535 ]
zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM:
--------------------------------------------------------------
cc [~vanzin] @*[gatorsmile|https://github.com/gatorsmile]*
was (Author: zuo.tingbing9):
cc [~vanzin]
> Bouncing Zookeeper node causes Active spark master to exit
> ----------------------------------------------------------
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.1
> Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum
> Reporter: Steven Lowenthal
> Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit. The master should have connected to a second zookeeper node.
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x154dfc0426b0054, likely server has closed socket, closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x254c701f28d0053, likely server has closed socket, closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master shutting down. }}
> {code}
> spark-env.sh:
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org