You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Kanak Biscuitwala (JIRA)" <ji...@apache.org> on 2013/11/22 19:02:35 UTC

[jira] [Created] (HELIX-321) Controller forgets that it's the leader

Kanak Biscuitwala created HELIX-321:
---------------------------------------

             Summary: Controller forgets that it's the leader
                 Key: HELIX-321
                 URL: https://issues.apache.org/jira/browse/HELIX-321
             Project: Apache Helix
          Issue Type: Bug
            Reporter: Kanak Biscuitwala
         Attachments: leader_election.txt

1. See log messages:
INFO [2013-11-22 17:34:11,919] main-SendThread(eat1-app87.corp:2181) - org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 20171ms for sessionid 0x142016175c10856, closing socket connection and attempting reconnect
INFO [2013-11-22 17:34:22,051] main-SendThread(eat1-app87.corp:2181) - org.apache.zookeeper.ClientCnxn - Opening socket connection to server eat1-app87.corp/172.18.158.133:2181
INFO [2013-11-22 17:34:22,052] main-SendThread(eat1-app87.corp:2181) - org.apache.zookeeper.ClientCnxn - Socket connection established to eat1-app87.corp/172.18.158.133:2181, initiating session
INFO [2013-11-22 17:34:22,055] main-SendThread(eat1-app87.corp:2181) - org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x142016175c10856 has expired, closing socket connection
INFO [2013-11-22 17:34:22,055] main-EventThread - org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)
INFO [2013-11-22 17:34:22,055] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.manager.zk.ZkHelixConnection - KeeperState:Expired, expiredSessionId: 142016175c10856

2. Controller reconnects, removes all callbacks
INFO [2013-11-22 17:34:22,068] main-SendThread(eat1-app87.corp:2181) - org.apache.zookeeper.ClientCnxn - Socket connection established to eat1-app87.corp/172.18.158.133:2181, initiating session
INFO [2013-11-22 17:34:22,126] main-SendThread(eat1-app87.corp:2181) - org.apache.zookeeper.ClientCnxn - Session establishment complete on server eat1-app87.corp/172.18.158.133:2181, sessionid = 0x142016175c1085c, negotiated timeout = 30000
INFO [2013-11-22 17:34:22,126] main-EventThread - org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected)

3. Callbacks ignored; not leader, relenquishes leadership
ERROR [2013-11-22 17:34:22,187] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.controller.GenericHelixController - Cluster manager: controller1 is not leader. Pipeline will not be invoked
INFO [2013-11-22 17:34:22,200] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 reqlinquishes leadership of cluster: perf-test-cluster

4. Controller reacquires leadership
INFO [2013-11-22 17:34:22,204] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 is trying to acquire leadership for cluster: perf-test-cluster
INFO [2013-11-22 17:34:22,215] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 acquires leadership of cluster: perf-test-cluster

4. Controller thinks it's not leader even though the LEADER node is in place and correct
ERROR [2013-11-22 17:34:22,294] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.controller.GenericHelixController - Cluster manager: controller1 is not leader. Pipeline will not be invoked

5. Controller tries to become leader when it already is???
INFO [2013-11-22 17:34:22,335] ZkClient-EventThread-10-eat1-app87.corp:2181 - org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 is trying to acquire leadership for cluster: perf-test-cluster

Logs attached



--
This message was sent by Atlassian JIRA
(v6.1#6144)