You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Ethan Li (JIRA)" <ji...@apache.org> on 2018/03/27 15:32:00 UTC

[jira] [Updated] (STORM-3012) Nimbus will crash if pacemaker is restarted

     [ https://issues.apache.org/jira/browse/STORM-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Li updated STORM-3012:
----------------------------
    Description: 
Below is the nimbus.log when I restarted pacemaker. Nimbus crashed because of NPE.

 

 
{code:java}
2018-03-26 21:39:18.404 main o.a.s.z.LeaderElectorImp [INFO] Queued up for leader lock.
2018-03-26 21:39:18.458 main o.a.s.d.m.MetricsUtils [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableRepor
ter
2018-03-26 21:39:18.461 main o.a.s.d.m.r.JmxPreparableReporter [INFO] Preparing...
2018-03-26 21:39:18.527 main o.a.s.m.StormMetricsRegistry [INFO] Started statistics report plugin...
2018-03-26 21:39:18.710 main o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:18.738 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread started.
2018-03-26 21:39:18.739 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
2018-03-26 21:39:18.739 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting at:        Mon Mar 26 21:39:18 UTC 2018
2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires:                  Tue Mar 27 21:39:18 UTC 2018
2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh sleeping until: Tue Mar 27 17:39:22 UTC 2018
2018-03-26 21:39:18.756 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 sessionTimeout
=60000 watcher=org.apache.curator.ConnectionState@148c7c4b
2018-03-26 21:39:18.807 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema
2018-03-26 21:39:18.814 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
hanism.
2018-03-26 21:39:18.815 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74b
lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2018-03-26 21:39:18.816 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue
-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
2018-03-26 21:39:18.817 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server open
qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 0x1624f6d49dd0cdd, negotiated timeout = 40000
2018-03-26 21:39:18.818 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2018-03-26 21:39:18.839 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting
2018-03-26 21:39:18.841 main o.a.z.ZooKeeper [INFO] Session: 0x1624f6d49dd0cdd closed
2018-03-26 21:39:18.842 main-EventThread o.a.z.ClientCnxn [INFO] EventThread shut down
2018-03-26 21:39:18.844 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
2018-03-26 21:39:18.844 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-03-26 21:39:18.875 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE
_CI sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@211febf3
2018-03-26 21:39:18.908 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
hanism.
2018-03-26 21:39:18.909 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74b
lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2018-03-26 21:39:18.910 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue
-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
2018-03-26 21:39:18.911 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server open
qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 0x1624f6d49dd0cde, negotiated timeout = 40000
2018-03-26 21:39:18.920 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema
2018-03-26 21:39:18.923 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2018-03-26 21:39:18.986 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server for storm version '2.0.0.y'
2018-03-26 21:39:19.931 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-blobs [] local-topology-blobs [] diff-topology-blobs []
2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-dependencies [] local-blobs [] diff-topology-dependencies []
2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting leadership, all active topologies and corresponding dependencies found local
ly.
2018-03-26 21:39:20.636 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 1381 ms for 0 topologies
2018-03-26 21:39:20.901 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: open
qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:20.901 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 101ms (NOT MAX)
2018-03-26 21:39:21.003 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: open
qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.003 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 102ms (NOT MAX)
2018-03-26 21:39:21.106 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.106 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 106ms (NOT MAX)
2018-03-26 21:39:21.214 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.214 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 115ms (NOT MAX)
2018-03-26 21:39:21.331 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.331 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 129ms (NOT MAX)
2018-03-26 21:39:21.462 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.462 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 162ms (NOT MAX)
2018-03-26 21:39:21.626 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.626 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 176ms (NOT MAX)
2018-03-26 21:39:21.807 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.807 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 319ms (NOT MAX)
2018-03-26 21:39:21.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:22.128 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:22.128 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 603ms (NOT MAX)
2018-03-26 21:39:22.733 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:22.733 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 868ms (NOT MAX)
2018-03-26 21:39:22.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:23.603 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:23.603 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 1494ms (NOT MAX)
2018-03-26 21:39:23.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
lue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.blue.ygrid.yahoo.com/10.215.76.243:6699
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
lue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.b
lue.ygrid.yahoo.com/10.215.76.243:6699
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
2018-03-26 21:39:25.763 client-worker-1 o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:25.765 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@493cfe64
2018-03-26 21:39:26.596 timer o.a.s.d.n.Nimbus [ERROR] Error while processing event
java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2508) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
Caused by: java.lang.NullPointerException
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:195) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        ... 2 more
2018-03-26 21:39:26.596 timer o.a.s.u.Utils [ERROR] Halting process: Error while processing event
java.lang.RuntimeException: Halting process: Error while processing event
        at org.apache.storm.utils.Utils.exitProcess(Utils.java:469) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$new$23(Nimbus.java:1154) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:106) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:26.600 Thread-16 o.a.s.u.Utils [INFO] Halting after 5 seconds
2018-03-26 21:39:26.606 Thread-15 o.a.s.d.n.Nimbus [INFO] Shutting down master
2018-03-26 21:39:31.600 Thread-16 o.a.s.u.Utils [WARN] Forcing Halt...
{code}
 

 

This is because when [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L195-L198] happens,

 
{code:java}
HBMessage ret = messages[next];
if(ret == null) {
// This can happen if we lost the connection and subsequently reconnected or timed out.
send(m);
}
messages[next] = null;
LOG.debug("Got Response: {}", ret);
return ret;
{code}
it returns null result. And the null result is inserted into [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClientPool.java#L65-L66]
{code:java}
for(String s : servers) {
HBMessage response = getClientForServer(s).send(m);
responses.add(response);
}
{code}
which leads to [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/PaceMakerStateStorage.java#L195]

 
{code:java}
for(HBMessage response : responses) {
if (response.get_type() != HBServerMessageType.GET_ALL_NODES_FOR_PATH_RESPONSE) {
LOG.error("get_worker_hb_children: Invalid Response Type");
continue;
}
if(response.get_data().get_nodes().get_pulseIds() != null) {
retSet.addAll(response.get_data().get_nodes().get_pulseIds());
}
}
{code}
 

and this is where NPE happens 

  was:
Below is the nimbus.log when I restarted pacemaker. Nimbus crashed because of NPE.

 

 
{code:java}
2018-03-26 21:39:18.404 main o.a.s.z.LeaderElectorImp [INFO] Queued up for leader lock.
2018-03-26 21:39:18.458 main o.a.s.d.m.MetricsUtils [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableRepor
ter
2018-03-26 21:39:18.461 main o.a.s.d.m.r.JmxPreparableReporter [INFO] Preparing...
2018-03-26 21:39:18.527 main o.a.s.m.StormMetricsRegistry [INFO] Started statistics report plugin...
2018-03-26 21:39:18.710 main o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:18.738 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread started.
2018-03-26 21:39:18.739 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
2018-03-26 21:39:18.739 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting at:        Mon Mar 26 21:39:18 UTC 2018
2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires:                  Tue Mar 27 21:39:18 UTC 2018
2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh sleeping until: Tue Mar 27 17:39:22 UTC 2018
2018-03-26 21:39:18.756 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 sessionTimeout
=60000 watcher=org.apache.curator.ConnectionState@148c7c4b
2018-03-26 21:39:18.807 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema
2018-03-26 21:39:18.814 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
hanism.
2018-03-26 21:39:18.815 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74b
lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2018-03-26 21:39:18.816 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue
-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
2018-03-26 21:39:18.817 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server open
qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 0x1624f6d49dd0cdd, negotiated timeout = 40000
2018-03-26 21:39:18.818 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2018-03-26 21:39:18.839 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting
2018-03-26 21:39:18.841 main o.a.z.ZooKeeper [INFO] Session: 0x1624f6d49dd0cdd closed
2018-03-26 21:39:18.842 main-EventThread o.a.z.ClientCnxn [INFO] EventThread shut down
2018-03-26 21:39:18.844 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
2018-03-26 21:39:18.844 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2018-03-26 21:39:18.875 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE
_CI sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@211febf3
2018-03-26 21:39:18.908 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
hanism.
2018-03-26 21:39:18.909 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74b
lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2018-03-26 21:39:18.910 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue
-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
2018-03-26 21:39:18.911 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server open
qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 0x1624f6d49dd0cde, negotiated timeout = 40000
2018-03-26 21:39:18.920 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema
2018-03-26 21:39:18.923 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2018-03-26 21:39:18.986 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server for storm version '2.0.0.y'
2018-03-26 21:39:19.931 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-blobs [] local-topology-blobs [] diff-topology-blobs []
2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-dependencies [] local-blobs [] diff-topology-dependencies []
2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting leadership, all active topologies and corresponding dependencies found local
ly.
2018-03-26 21:39:20.636 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 1381 ms for 0 topologies
2018-03-26 21:39:20.901 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: open
qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:20.901 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 101ms (NOT MAX)
2018-03-26 21:39:21.003 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: open
qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.003 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 102ms (NOT MAX)
2018-03-26 21:39:21.106 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.106 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 106ms (NOT MAX)
2018-03-26 21:39:21.214 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.214 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 115ms (NOT MAX)
2018-03-26 21:39:21.331 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.331 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 129ms (NOT MAX)
2018-03-26 21:39:21.462 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.462 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 162ms (NOT MAX)
2018-03-26 21:39:21.626 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.626 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 176ms (NOT MAX)
2018-03-26 21:39:21.807 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:21.807 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 319ms (NOT MAX)
2018-03-26 21:39:21.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:22.128 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:22.128 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 603ms (NOT MAX)
2018-03-26 21:39:22.733 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:22.733 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 868ms (NOT MAX)
2018-03-26 21:39:22.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:23.603 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:23.603 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 1494ms (NOT MAX)
2018-03-26 21:39:23.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
lue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.blue.ygrid.yahoo.com/10.215.76.243:6699
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
        at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
lue.ygrid.yahoo.com/10.215.76.240:6699
2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.b
lue.ygrid.yahoo.com/10.215.76.243:6699
2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
2018-03-26 21:39:25.763 client-worker-1 o.a.s.m.n.Login [INFO] successfully logged in.
2018-03-26 21:39:25.765 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@493cfe64
2018-03-26 21:39:26.596 timer o.a.s.d.n.Nimbus [ERROR] Error while processing event
java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2508) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
Caused by: java.lang.NullPointerException
        at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:195) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        ... 2 more
2018-03-26 21:39:26.596 timer o.a.s.u.Utils [ERROR] Halting process: Error while processing event
java.lang.RuntimeException: Halting process: Error while processing event
        at org.apache.storm.utils.Utils.exitProcess(Utils.java:469) ~[storm-client-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.daemon.nimbus.Nimbus.lambda$new$23(Nimbus.java:1154) ~[storm-server-2.0.0.y.jar:2.0.0.y]
        at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:106) ~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-03-26 21:39:26.600 Thread-16 o.a.s.u.Utils [INFO] Halting after 5 seconds
2018-03-26 21:39:26.606 Thread-15 o.a.s.d.n.Nimbus [INFO] Shutting down master
2018-03-26 21:39:31.600 Thread-16 o.a.s.u.Utils [WARN] Forcing Halt...
{code}
 

 

This is because when [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L195-L198] happens,

 
{code:java}
HBMessage ret = messages[next];
if(ret == null) {
// This can happen if we lost the connection and subsequently reconnected or timed out.
send(m);
}
messages[next] = null;
LOG.debug("Got Response: {}", ret);
return ret;
{code}
it returns null result. And the null result is inserted into [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClientPool.java#L65-L66]
{code:java}
for(String s : servers) {
HBMessage response = getClientForServer(s).send(m);
responses.add(response);
}
{code}
which leads to [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/PaceMakerStateStorage.java#L195]

 
{code:java}
for(HBMessage response : responses) {
if (response.get_type() != HBServerMessageType.GET_ALL_NODES_FOR_PATH_RESPONSE) {
LOG.error("get_worker_hb_children: Invalid Response Type");
continue;
}
if(response.get_data().get_nodes().get_pulseIds() != null) {
retSet.addAll(response.get_data().get_nodes().get_pulseIds());
}
}
{code}
 

 

 

NPE.

 


> Nimbus will crash if pacemaker is restarted
> -------------------------------------------
>
>                 Key: STORM-3012
>                 URL: https://issues.apache.org/jira/browse/STORM-3012
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Ethan Li
>            Priority: Major
>
> Below is the nimbus.log when I restarted pacemaker. Nimbus crashed because of NPE.
>  
>  
> {code:java}
> 2018-03-26 21:39:18.404 main o.a.s.z.LeaderElectorImp [INFO] Queued up for leader lock.
> 2018-03-26 21:39:18.458 main o.a.s.d.m.MetricsUtils [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableRepor
> ter
> 2018-03-26 21:39:18.461 main o.a.s.d.m.r.JmxPreparableReporter [INFO] Preparing...
> 2018-03-26 21:39:18.527 main o.a.s.m.StormMetricsRegistry [INFO] Started statistics report plugin...
> 2018-03-26 21:39:18.710 main o.a.s.m.n.Login [INFO] successfully logged in.
> 2018-03-26 21:39:18.738 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread started.
> 2018-03-26 21:39:18.739 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
> 2018-03-26 21:39:18.739 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting at:        Mon Mar 26 21:39:18 UTC 2018
> 2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires:                  Tue Mar 27 21:39:18 UTC 2018
> 2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh sleeping until: Tue Mar 27 17:39:22 UTC 2018
> 2018-03-26 21:39:18.756 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 sessionTimeout
> =60000 watcher=org.apache.curator.ConnectionState@148c7c4b
> 2018-03-26 21:39:18.807 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema
> 2018-03-26 21:39:18.814 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
> hanism.
> 2018-03-26 21:39:18.815 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74b
> lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
> 2018-03-26 21:39:18.816 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue
> -gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
> 2018-03-26 21:39:18.817 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server open
> qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 0x1624f6d49dd0cdd, negotiated timeout = 40000
> 2018-03-26 21:39:18.818 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
> 2018-03-26 21:39:18.839 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting
> 2018-03-26 21:39:18.841 main o.a.z.ZooKeeper [INFO] Session: 0x1624f6d49dd0cdd closed
> 2018-03-26 21:39:18.842 main-EventThread o.a.z.ClientCnxn [INFO] EventThread shut down
> 2018-03-26 21:39:18.844 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
> 2018-03-26 21:39:18.844 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-03-26 21:39:18.875 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE
> _CI sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@211febf3
> 2018-03-26 21:39:18.908 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
> hanism.
> 2018-03-26 21:39:18.909 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74b
> lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
> 2018-03-26 21:39:18.910 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue
> -gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
> 2018-03-26 21:39:18.911 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server open
> qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 0x1624f6d49dd0cde, negotiated timeout = 40000
> 2018-03-26 21:39:18.920 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema
> 2018-03-26 21:39:18.923 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
> 2018-03-26 21:39:18.986 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server for storm version '2.0.0.y'
> 2018-03-26 21:39:19.931 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-blobs [] local-topology-blobs [] diff-topology-blobs []
> 2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-dependencies [] local-blobs [] diff-topology-dependencies []
> 2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting leadership, all active topologies and corresponding dependencies found local
> ly.
> 2018-03-26 21:39:20.636 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 1381 ms for 0 topologies
> 2018-03-26 21:39:20.901 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: open
> qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:20.901 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 101ms (NOT MAX)
> 2018-03-26 21:39:21.003 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: open
> qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.003 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 102ms (NOT MAX)
> 2018-03-26 21:39:21.106 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.106 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 106ms (NOT MAX)
> 2018-03-26 21:39:21.214 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.214 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 115ms (NOT MAX)
> 2018-03-26 21:39:21.331 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.331 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 129ms (NOT MAX)
> 2018-03-26 21:39:21.462 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.462 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 162ms (NOT MAX)
> 2018-03-26 21:39:21.626 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.626 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 176ms (NOT MAX)
> 2018-03-26 21:39:21.807 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.807 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 319ms (NOT MAX)
> 2018-03-26 21:39:21.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
>         at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:22.128 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:22.128 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 603ms (NOT MAX)
> 2018-03-26 21:39:22.733 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:22.733 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 868ms (NOT MAX)
> 2018-03-26 21:39:22.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
>         at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:23.603 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] Connection to pacemaker failed. Trying to reconnect Connection refused: openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:23.603 client-boss-1 o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 1494ms (NOT MAX)
> 2018-03-26 21:39:23.888 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
>         at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
>         at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b        at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
> lue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
> 2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully logged in.
> 2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.blue.ygrid.yahoo.com/10.215.76.243:6699
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting for channel ready.
>         at org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
> lue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
> 2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully logged in.
> 2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.b
> lue.ygrid.yahoo.com/10.215.76.243:6699
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Creating Kerberos Client.
> 2018-03-26 21:39:25.763 client-worker-1 o.a.s.m.n.Login [INFO] successfully logged in.
> 2018-03-26 21:39:25.765 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@493cfe64
> 2018-03-26 21:39:26.596 timer o.a.s.d.n.Nimbus [ERROR] Error while processing event
> java.lang.RuntimeException: java.lang.NullPointerException
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2508) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> Caused by: java.lang.NullPointerException
>         at org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:195) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         ... 2 more
> 2018-03-26 21:39:26.596 timer o.a.s.u.Utils [ERROR] Halting process: Error while processing event
> java.lang.RuntimeException: Halting process: Error while processing event
>         at org.apache.storm.utils.Utils.exitProcess(Utils.java:469) ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.lambda$new$23(Nimbus.java:1154) ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:106) ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:26.600 Thread-16 o.a.s.u.Utils [INFO] Halting after 5 seconds
> 2018-03-26 21:39:26.606 Thread-15 o.a.s.d.n.Nimbus [INFO] Shutting down master
> 2018-03-26 21:39:31.600 Thread-16 o.a.s.u.Utils [WARN] Forcing Halt...
> {code}
>  
>  
> This is because when [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L195-L198] happens,
>  
> {code:java}
> HBMessage ret = messages[next];
> if(ret == null) {
> // This can happen if we lost the connection and subsequently reconnected or timed out.
> send(m);
> }
> messages[next] = null;
> LOG.debug("Got Response: {}", ret);
> return ret;
> {code}
> it returns null result. And the null result is inserted into [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClientPool.java#L65-L66]
> {code:java}
> for(String s : servers) {
> HBMessage response = getClientForServer(s).send(m);
> responses.add(response);
> }
> {code}
> which leads to [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/PaceMakerStateStorage.java#L195]
>  
> {code:java}
> for(HBMessage response : responses) {
> if (response.get_type() != HBServerMessageType.GET_ALL_NODES_FOR_PATH_RESPONSE) {
> LOG.error("get_worker_hb_children: Invalid Response Type");
> continue;
> }
> if(response.get_data().get_nodes().get_pulseIds() != null) {
> retSet.addAll(response.get_data().get_nodes().get_pulseIds());
> }
> }
> {code}
>  
> and this is where NPE happens 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)