You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2017/10/18 19:27:00 UTC

[jira] [Comment Edited] (STORM-2706) Nimbus stuck in exception and does not fail fast

    [ https://issues.apache.org/jira/browse/STORM-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209870#comment-16209870 ] 

Stig Rohde Døssing edited comment on STORM-2706 at 10/18/17 7:26 PM:
---------------------------------------------------------------------

Upgrading to 4.0.0 seems pretty easy, the tests pass with it with only a few minor changes. I'll poke a bit more at it and then open a PR so others can hopefully help try it out. The upgrade should probably go on this issue, so I'll assign it to me. Let me know if the upgrade belongs in a different issue instead.


was (Author: srdo):
Upgrading to 4.0.0 seems pretty easy, the tests pass with it with only a few minor changes. I'll poke a bit more at it and then open a PR. The upgrade should probably go on this issue, so I'll assign it to me. Let me know if the upgrade belongs in a different issue instead.

> Nimbus stuck in exception and does not fail fast
> ------------------------------------------------
>
>                 Key: STORM-2706
>                 URL: https://issues.apache.org/jira/browse/STORM-2706
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: Bijan Fahimi Shemrani
>            Assignee: Stig Rohde Døssing
>              Labels: nimbus
>
> We experience a problem in nimbus which leads it to get stuck in a retry and fail loop. When I manually restart the nimbus it works again as expected. However, it would be great if nimbus would shut down so our monitoring can automatically restart the nimbus. 
> The nimbus log. 
> {noformat}
> 24.8.2017 15:39:1913:39:19.804 [pool-13-thread-51] ERROR org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - Unexpected throwable while invoking!
> 24.8.2017 15:39:19org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /storm/leader-lock
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
> 24.8.2017 15:39:19	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
> 24.8.2017 15:39:19	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> 24.8.2017 15:39:19	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19	at org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[?:?]
> 24.8.2017 15:39:19	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
> 24.8.2017 15:39:19	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> 24.8.2017 15:39:19	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19	at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getLeader(nimbus.clj:2412) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3944) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3928) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
> 24.8.2017 15:39:19	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
> 24.8.2017 15:39:19	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 24.8.2017 15:39:2713:39:27.205 [pool-13-thread-52] ERROR org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - Unexpected throwable while invoking!
> 24.8.2017 15:39:27org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /storm/leader-lock
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
> 24.8.2017 15:39:27	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
> 24.8.2017 15:39:27	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> 24.8.2017 15:39:27	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27	at org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[?:?]
> 24.8.2017 15:39:27	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
> 24.8.2017 15:39:27	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> 24.8.2017 15:39:27	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27	at org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
> 24.8.2017 15:39:27	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
> 24.8.2017 15:39:27	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
> 24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
> 24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
> 24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
> 24.8.2017 15:39:4913:39:49.271 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
> 24.8.2017 15:39:4913:39:49.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
> 24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
> 24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
> 24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
> 24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
> 24.8.2017 15:40:1313:40:13.806 [timer] INFO  org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
> 24.8.2017 15:40:1313:40:13.807 [timer] INFO  org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper:2181/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@f90354
> 24.8.2017 15:40:1313:40:13.808 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.42.174.214/10.42.174.214:2181. Will not attempt to authenticate using SASL (unknown error)
> 24.8.2017 15:40:1313:40:13.862 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Socket connection established to 10.42.174.214/10.42.174.214:2181, initiating session
> 24.8.2017 15:40:1313:40:13.865 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.42.174.214/10.42.174.214:2181, sessionid = 0x15e14456dc70045, negotiated timeout = 20000
> 24.8.2017 15:40:1313:40:13.910 [timer] INFO  org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Session: 0x15e14456dc70045 closed
> 24.8.2017 15:40:1313:40:13.910 [timer-EventThread] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - EventThread shut down
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)