You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Joaquin Menchaca <jm...@gobalto.com> on 2016/12/22 23:22:37 UTC

How could one debug the root cause of this error?

org.apache.storm.utils.NimbusLeaderNotFoundException: Found nimbuses
[] none of which is elected as leader, please try again after some
time.
	at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:85)
	at org.apache.storm.ui.core$cluster_summary.invoke(core.clj:351)
	at org.apache.storm.ui.core$fn__12593.invoke(core.clj:931)
	at org.apache.storm.shade.compojure.core$make_route$fn__4631.invoke(core.clj:100)
	at org.apache.storm.shade.compojure.core$if_route$fn__4619.invoke(core.clj:46)
	at org.apache.storm.shade.compojure.core$if_method$fn__4612.invoke(core.clj:31)
	at org.apache.storm.shade.compojure.core$routing$fn__4637.invoke(core.clj:113)
	at clojure.core$some.invoke(core.clj:2570)
	at org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:113)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invoke(core.clj:632)
	at org.apache.storm.shade.compojure.core$routes$fn__4641.invoke(core.clj:118)
	at org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__12065.invoke(json.clj:56)
	at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__5766.invoke(multipart_params.clj:118)
	at org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__11217.invoke(reload.clj:22)
	at org.apache.storm.ui.helpers$requests_middleware$fn__6019.invoke(helpers.clj:50)
	at org.apache.storm.ui.core$catch_errors$fn__12786.invoke(core.clj:1225)
	at org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__5686.invoke(keyword_params.clj:35)
	at org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__5729.invoke(nested_params.clj:84)
	at org.apache.storm.shade.ring.middleware.params$wrap_params$fn__5658.invoke(params.clj:64)
	at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__5766.invoke(multipart_params.clj:118)
	at org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__5981.invoke(flash.clj:35)
	at org.apache.storm.shade.ring.middleware.session$wrap_session$fn__5967.invoke(session.clj:98)
	at org.apache.storm.shade.ring.util.servlet$make_service_method$fn__5516.invoke(servlet.clj:127)
	at org.apache.storm.shade.ring.util.servlet$servlet$fn__5520.invoke(servlet.clj:136)
	at org.apache.storm.shade.ring.util.servlet.proxy$javax.servlet.http.HttpServlet$ff19274a.service(Unknown
Source)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:654)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1320)
	at org.apache.storm.logging.filters.AccessLoggingFilter.handle(AccessLoggingFilter.java:47)
	at org.apache.storm.logging.filters.AccessLoggingFilter.doFilter(AccessLoggingFilter.java:39)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
	at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
	at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
	at org.apache.storm.ui.helpers$x_frame_options_filter_handler$fn__6112.invoke(helpers.clj:189)
	at org.apache.storm.ui.helpers.proxy$java.lang.Object$Filter$abec9a8f.doFilter(Unknown
Source)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
	at org.apache.storm.shade.org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:247)
	at org.apache.storm.shade.org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:210)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:443)
	at org.apache.storm.shade.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1044)
	at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:372)
	at org.apache.storm.shade.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:978)
	at org.apache.storm.shade.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.apache.storm.shade.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.apache.storm.shade.org.eclipse.jetty.server.Server.handle(Server.java:369)
	at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:486)
	at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:933)
	at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:995)
	at org.apache.storm.shade.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
	at org.apache.storm.shade.org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.apache.storm.shade.org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.apache.storm.shade.org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
	at org.apache.storm.shade.org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
	at org.apache.storm.shade.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.apache.storm.shade.org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)



-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。

Re: How could one debug the root cause of this error?

Posted by Joaquin Menchaca <jm...@gobalto.com>.
I am used to working with ActiveRecord in Rails.  I wish there was a
'zk:migrate' script, or other tooling to avoid this use case as I never
touched zookeeper before storm, so i am not an expert on it.  If I knew how
to wipe out current storm, I would have done that instead of the nuclear
option, i.e. rm -rf.

I wish that the root key would be unique or have a schema version in it, so
that storm 1.x would use a diff root. Then this could be avoided.

On Jan 14, 2017 12:20 PM, "Erik Weathers" <ew...@groupon.com> wrote:

> On Fri, Jan 13, 2017 at 8:56 PM, Joaquin Menchaca <jm...@gobalto.com>
> wrote:
>
>> I bounce everything across the cluster and it fixed the problem.
>> Zookeeper ocassionally has data in a broken state.  There is no data
>> integrity check yet.
>>
>> I also found I ran out of space on Zookeeper as it is chatting and
>> keeping gigabytes of archives. I turned that off.
>>
>> One time when i upgraded from 0.9 to 1.0, zk data was so mess up, lots of
>> crashes.
>>
>
> For completeness, that's an unfortunate but expected behavior, because
> Storm stores lots of serialized objects into ZooKeeper, and the 0.9 to 1.0
> change included backwards-incompatible changes that broke the
> deserialization.  The most pervasive of those changes was the package path
> change from "backtype.*" to "org.apache.*", but there might have been
> others.   I agree that it would be nice if there was some validation to
> decide whether state should be rejected.
>
> - Erik
>
>
>> I blasted manually (rm -rf) all zk data, and that fixed things up.
>>
>> On Dec 22, 2016 4:37 PM, "Hugo Da Cruz Louro" <hl...@hortonworks.com>
>> wrote:
>>
>>> Is it doable for you to restart your zookeeper cluster? If possible, can
>>> you do so, and then restart storm and deploy your storm topology again.
>>>
>>> On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>
>>> wrote:
>>>
>>> Found nimbuses [] none of which is elected as leader, please try again after some time
>>>
>>>
>>>
>

Re: How could one debug the root cause of this error?

Posted by Erik Weathers <ew...@groupon.com>.
On Fri, Jan 13, 2017 at 8:56 PM, Joaquin Menchaca <jm...@gobalto.com>
wrote:

> I bounce everything across the cluster and it fixed the problem.
> Zookeeper ocassionally has data in a broken state.  There is no data
> integrity check yet.
>
> I also found I ran out of space on Zookeeper as it is chatting and keeping
> gigabytes of archives. I turned that off.
>
> One time when i upgraded from 0.9 to 1.0, zk data was so mess up, lots of
> crashes.
>

For completeness, that's an unfortunate but expected behavior, because
Storm stores lots of serialized objects into ZooKeeper, and the 0.9 to 1.0
change included backwards-incompatible changes that broke the
deserialization.  The most pervasive of those changes was the package path
change from "backtype.*" to "org.apache.*", but there might have been
others.   I agree that it would be nice if there was some validation to
decide whether state should be rejected.

- Erik


> I blasted manually (rm -rf) all zk data, and that fixed things up.
>
> On Dec 22, 2016 4:37 PM, "Hugo Da Cruz Louro" <hl...@hortonworks.com>
> wrote:
>
>> Is it doable for you to restart your zookeeper cluster? If possible, can
>> you do so, and then restart storm and deploy your storm topology again.
>>
>> On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>
>> wrote:
>>
>> Found nimbuses [] none of which is elected as leader, please try again after some time
>>
>>
>>

Re: How could one debug the root cause of this error?

Posted by Joaquin Menchaca <jm...@gobalto.com>.
I bounce everything across the cluster and it fixed the problem.  Zookeeper
ocassionally has data in a broken state.  There is no data integrity check
yet.

I also found I ran out of space on Zookeeper as it is chatting and keeping
gigabytes of archives. I turned that off.

One time when i upgraded from 0.9 to 1.0, zk data was so mess up, lots of
crashes.  I blasted manually (rm -rf) all zk data, and that fixed things up.

On Dec 22, 2016 4:37 PM, "Hugo Da Cruz Louro" <hl...@hortonworks.com>
wrote:

> Is it doable for you to restart your zookeeper cluster? If possible, can
> you do so, and then restart storm and deploy your storm topology again.
>
> On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>
> wrote:
>
> Found nimbuses [] none of which is elected as leader, please try again after some time
>
>
>

Re: How could one debug the root cause of this error?

Posted by Hugo Da Cruz Louro <hl...@hortonworks.com>.
Also, if it’s possible (i.e you don’t have any state that you need to keep), you can also delete the Storm zookeeper nodes by login into zkCli and executing ‘rm. /storm’, and then restarting storm and redeploying your topology.

On Dec 22, 2016, at 4:37 PM, Hugo Louro <hl...@hortonworks.com>> wrote:

Is it doable for you to restart your zookeeper cluster? If possible, can you do so, and then restart storm and deploy your storm topology again.

On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>> wrote:


Found nimbuses [] none of which is elected as leader, please try again after some time



Re: How could one debug the root cause of this error?

Posted by Hugo Da Cruz Louro <hl...@hortonworks.com>.
Is it doable for you to restart your zookeeper cluster? If possible, can you do so, and then restart storm and deploy your storm topology again.

On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>> wrote:


Found nimbuses [] none of which is elected as leader, please try again after some time