You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" <mr...@bloomberg.net> on 2019/09/23 16:46:19 UTC
Leader Election issues on cluster restart
We are currently running a storm cluster on one machine. So there is one nimbus/supervisor instance in a given cluster. We have recently had issues where Nimbus was started and was unable to become leader. There were no other instances running at this time. The cluster we seemingly brought down successfully:
1609 2019-09-21 22:12:47,518 INFO nimbus [Thread-7] Shutting down master
1610 2019-09-21 22:12:47,520 INFO CuratorFrameworkImpl [Curator-Framework-0] backgroundOperationsLoop exiting
1611 2019-09-21 22:12:47,527 INFO ZooKeeper [Thread-7] Session: 0x30000223e30079a closed
1612 2019-09-21 22:12:47,527 INFO ClientCnxn [main-EventThread] EventThread shut down
1613 2019-09-21 22:12:47,528 INFO CuratorFrameworkImpl [Curator-Framework-0] backgroundOperationsLoop exiting
1614 2019-09-21 22:12:47,533 INFO ClientCnxn [main-EventThread] EventThread shut down
1615 2019-09-21 22:12:47,533 INFO ZooKeeper [Thread-7] Session: 0x30000223e30079b closed
1616 2019-09-21 22:12:47,534 INFO CuratorFrameworkImpl [Curator-Framework-0] backgroundOperationsLoop exiting
1617 2019-09-21 22:12:47,539 INFO ClientCnxn [main-EventThread] EventThread shut down
1618 2019-09-21 22:12:47,539 INFO ZooKeeper [Thread-7] Session: 0x30000223e300798 closed
1619 2019-09-21 22:12:47,539 INFO nimbus [Thread-7] Shut down master
And then brought back up 20 minutes later. When brought up, we immediately started seeing:
2019-09-21 22:32:47,082 INFO JmxPreparableReporter [main] Preparing...
2019-09-21 22:32:47,098 INFO common [main] Started statistics report plugin...
2019-09-21 22:32:47,140 INFO nimbus [main] Starting nimbus server for storm version '1.2.1'
2019-09-21 22:32:47,219 INFO PlainSaslTransportPlugin [main] SASL PLAIN transport factory will be used
2019-09-21 22:32:47,858 INFO nimbus [timer] not a leader, skipping assignments
2019-09-21 22:32:47,858 INFO nimbus [timer] not a leader, skipping cleanup
2019-09-21 22:32:47,860 INFO nimbus [timer] not a leader, skipping credential renewal.
2019-09-21 22:32:49,134 INFO AbstractSaslServerCallbackHandler [pool-14-thread-1] Successfully authenticated client: authenticationID = op authorizationID = op
2019-09-21 22:32:49,171 INFO AbstractSaslServerCallbackHandler [pool-14-thread-2] Successfully authenticated client: authenticationID = op authorizationID = op
2019-09-21 22:32:57,858 INFO nimbus [timer] not a leader, skipping assignments
2019-09-21 22:32:57,859 INFO nimbus [timer] not a leader, skipping cleanup
2019-09-21 22:33:07,860 INFO nimbus [timer] not a leader, skipping assignments
2019-09-21 22:33:07,860 INFO nimbus [timer] not a leader, skipping cleanup
2019-09-21 22:33:17,862 INFO nimbus [timer] not a leader, skipping assignments
followed shortly by:
2019-09-21 22:33:52,409 WARN nimbus [pool-14-thread-7] Topology submission exception. (topology name='WingmanTopology4159') #error {
:cause not a leader, current leader is NimbusInfo{host='trslnydtraap01', port=30553, isLeader=true}
:via
[{:type java.lang.RuntimeException
:message not a leader, current leader is NimbusInfo{host='trslnydtraap01', port=30553, isLeader=true}
:at [org.apache.storm.daemon.nimbus$is_leader doInvoke nimbus.clj 150]}]
:trace
What could cause this election issue? If no other leader processes are running or known in the cluster, I am assuming that some sort of cluster state was not cleaned up correctly, either in ZooKeeper or on disk. In general, how does Storm mark whether there is a leader or not in a cluster? What could be the cause of the issue posted above?