You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by SuXingyu <xi...@outlook.com> on 2015/05/07 09:44:26 UTC

Storm worker died - Client is being closed, and does not take requests any more

Hi all,

I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several minutes.

I check the log in storm and find this:

2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3]
        ... 6 common frames omitted
2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")
java.lang.RuntimeException: ("Async loop died!")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui.

Any idea why this may be happening?

Best regard!

Xingyu




 		 	   		  

Re: Storm worker died - Client is being closed, and does not take requests any more

Posted by 임정택 <ka...@gmail.com>.
AFAIK Nimbus doesn't reassign task to other worker with normal circumstance.
Maybe worker didn't send heartbeat well, or Nimbus detected worker's dead.
Actually two things are same since Nimbus detects worker's dead by checking
heartbeat.

Hope this helps.

Thanks.
Jungtaek Lim (HeartSaVioR)

2015-05-11 13:08 GMT+09:00 SuXingyu <xi...@outlook.com>:

> It seems that this issue matches our situation best. I am wondering in
> what condition will the worker be moved to other slots except rebalance? Is
> moving worker a normal operation of nimbus?
>
> Anyway, sincerely thank you for your help!
>
> ------------------------------
> Date: Fri, 8 May 2015 20:35:50 +0900
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> From: kabhwan@gmail.com
> To: user@storm.apache.org
>
>
> If one or more workers were moved to other slots, you may hit this issue.
>
> https://issues.apache.org/jira/browse/STORM-737
>
> Hope this helps.
>
>
> 2015-05-08 18:53 GMT+09:00 SuXingyu <xi...@outlook.com>:
>
> I am running my cluster using Storm 0.9.3.  This exception was caused by
> sending msg while Netty client is closing. But still don't know why Netty
> client closed.
>
> ------------------------------
>  From: eweathers@groupon.com
> Date: Thu, 7 May 2015 23:34:16 -0700
>
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: user@storm.apache.org
>
> What version of Storm are you running?   I had significant issues with
> Storm v0.9.3 having what I'll term "congestive collapse" related to the
> Netty implementation.  Workers would continually die.  The problem was
> resolved by switching the transport layer from Netty back to the original
> ZeroMQ implementation, but that was not very trivial.
>
> - Erik
>
> On Thu, May 7, 2015 at 11:28 PM, SuXingyu <xi...@outlook.com> wrote:
>
> Yes, I've found the logs you metioned in Nimbus log. But still can't find
> any indication in worker logs. The first log genarated by storm after work
> start is "Async loop died".
>
> ------------------------------
> From: eweathers@groupon.com
> Date: Thu, 7 May 2015 22:56:41 -0700
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: user@storm.apache.org
>
>
> The "State" being ":disallowed" usually indicates that the worker's
> executors' slot assignment has changed.  If you look at the Nimbus logs, I
> would expect to see logs stating that the affected executors were noticed
> to be "not alive" by the Nimbus, and then an immediate "Setting new
> assignment for topology id" log line as the Nimbus creates a new assignment
> for this topology.  As for why the Worker itself died -- is there no
> indication in the worker logs?
>
> - Erik
>
> On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:
>
> I checked my supervisor log, and found logs like  "Shutting down and
> clearing state for id {worker.id}". But still can't find the cause of
> worker shutting down.
>
> Here are the logs in supervisor. It seems that worker's jvm exit before
> supervisor kill it.
>
> 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "message_storm-101-17-1430903865", :executors ...} for this supervisor
> 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
> 95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with
> command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g'
> '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m'
> '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4'
> '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
> '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
> '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails'
> '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime'
> '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log'
> '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
> '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
> '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
> '-Dstorm.id=message_storm-101-17-1430903865'
> '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701'
> '-cp'
> '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
> 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
> '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
> '95722030-7785-40d6-a47a-cc072c4c9d83'
> 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
> 1430904337. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305,
> :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
>
>
>
>
> ------------------------------
> From: JMaass@cccis.com
> To: user@storm.apache.org
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> Date: Thu, 7 May 2015 15:15:50 +0000
>
>
> Look into your supervisor logs, with them set to INFO.
>
>  Look for something like:
> “Shutting down and clearing state for id {worker.id}”
>
>  That id will match your worker’s logs.
>
>  The reason for the shutdown will be in the supervisor’s logs.
>
>
>
>
>   From: SuXingyu <xi...@outlook.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: 2015,Thursday, May 7 at 02:44
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Subject: Storm worker died - Client is being closed, and does not take
> requests any more
>
>   Hi all,
>
> I am runing a Kafka-Storm cluster in distributed mode. There are four
> supervisors in my storm cluster. For most of the time, it works fun. But
> after I redeployed topology yesterday, I found all the four workers in
> storm cluster restart after running several minutes.
>
> I check the log in storm and find this:
>
> 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:185)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.3.jar:0.9.3]
>         ... 6 common frames omitted
> 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop
> died!")
> java.lang.RuntimeException: ("Async loop died!")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
> [clojure-1.5.1.jar:na]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
> [storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
> message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
> 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> These logs appear several times in about twenty minutes. And the cluter
> restored after a reblance operation in Storm ui.
>
> Any idea why this may be happening?
>
> Best regard!
>
> Xingyu
>
>
>
>
>
>
>
>
>
> --
> Name : 임 정택
> Blog : http://www.heartsavior.net / http://dev.heartsavior.net
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>



-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

RE: Storm worker died - Client is being closed, and does not take requests any more

Posted by SuXingyu <xi...@outlook.com>.
It seems that this issue matches our situation best. I am wondering in what condition will the worker be moved to other slots except rebalance? Is moving worker  a normal operation of nimbus?

Anyway, sincerely thank you for your help!
 
Date: Fri, 8 May 2015 20:35:50 +0900
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
From: kabhwan@gmail.com
To: user@storm.apache.org

If one or more workers were moved to other slots, you may hit this issue.
https://issues.apache.org/jira/browse/STORM-737

Hope this helps.

2015-05-08 18:53 GMT+09:00 SuXingyu <xi...@outlook.com>:



I am running my cluster using Storm 0.9.3.  This exception was caused by sending msg while Netty client is closing. But still don't know why Netty client closed.

 From: eweathers@groupon.com
Date: Thu, 7 May 2015 23:34:16 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
To: user@storm.apache.org

What version of Storm are you running?   I had significant issues with Storm v0.9.3 having what I'll term "congestive collapse" related to the Netty implementation.  Workers would continually die.  The problem was resolved by switching the transport layer from Netty back to the original ZeroMQ implementation, but that was not very trivial.
- Erik
On Thu, May 7, 2015 at 11:28 PM, SuXingyu <xi...@outlook.com> wrote:



Yes, I've found the logs you metioned in Nimbus log. But still can't find any indication in worker logs. The first log genarated by storm after work start is "Async loop died".

From: eweathers@groupon.com
Date: Thu, 7 May 2015 22:56:41 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
To: user@storm.apache.org

The "State" being ":disallowed" usually indicates that the worker's executors' slot assignment has changed.  If you look at the Nimbus logs, I would expect to see logs stating that the affected executors were noticed to be "not alive" by the Nimbus, and then an immediate "Setting new assignment for topology id" log line as the Nimbus creates a new assignment for this topology.  As for why the Worker itself died -- is there no indication in the worker logs?
- Erik
On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:



I checked my supervisor log, and found logs like  "Shutting down and clearing state for id {worker.id}". But still can't find the cause of worker shutting down.

Here are the logs in supervisor. It seems that worker's jvm exit before supervisor kill it.

2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "message_storm-101-17-1430903865", :executors ...} for this supervisor 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id 95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log' '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' '-Dstorm.id=message_storm-101-17-1430903865' '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp' '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar' 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' '95722030-7785-40d6-a47a-cc072c4c9d83'
2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: 1430904337. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down 
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83




From: JMaass@cccis.com
To: user@storm.apache.org
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
Date: Thu, 7 May 2015 15:15:50 +0000






Look into your supervisor logs, with them set to INFO.



Look for something like:
“Shutting down and clearing state for id {worker.id}”



That id will match your worker’s logs.



The reason for the shutdown will be in the supervisor’s logs.














From: SuXingyu <xi...@outlook.com>

Reply-To: "user@storm.apache.org" <us...@storm.apache.org>

Date: 2015,Thursday, May 7 at 02:44

To: "user@storm.apache.org" <us...@storm.apache.org>

Subject: Storm worker died - Client is being closed, and does not take requests any more







Hi all,



I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several
 minutes.



I check the log in storm and find this:



2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!

java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3]

        ... 6 common frames omitted

2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")

java.lang.RuntimeException: ("Async loop died!")

        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700

2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT



These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui.



Any idea why this may be happening?



Best regard!



Xingyu












 		 	   		  

 		 	   		  

 		 	   		  


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsaviorLinkedIn : http://www.linkedin.com/in/heartsavior
 		 	   		  

Re: Storm worker died - Client is being closed, and does not take requests any more

Posted by 임정택 <ka...@gmail.com>.
If one or more workers were moved to other slots, you may hit this issue.

https://issues.apache.org/jira/browse/STORM-737

Hope this helps.


2015-05-08 18:53 GMT+09:00 SuXingyu <xi...@outlook.com>:

> I am running my cluster using Storm 0.9.3.  This exception was caused by
> sending msg while Netty client is closing. But still don't know why Netty
> client closed.
>
> ------------------------------
>  From: eweathers@groupon.com
> Date: Thu, 7 May 2015 23:34:16 -0700
>
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: user@storm.apache.org
>
> What version of Storm are you running?   I had significant issues with
> Storm v0.9.3 having what I'll term "congestive collapse" related to the
> Netty implementation.  Workers would continually die.  The problem was
> resolved by switching the transport layer from Netty back to the original
> ZeroMQ implementation, but that was not very trivial.
>
> - Erik
>
> On Thu, May 7, 2015 at 11:28 PM, SuXingyu <xi...@outlook.com> wrote:
>
> Yes, I've found the logs you metioned in Nimbus log. But still can't find
> any indication in worker logs. The first log genarated by storm after work
> start is "Async loop died".
>
> ------------------------------
> From: eweathers@groupon.com
> Date: Thu, 7 May 2015 22:56:41 -0700
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: user@storm.apache.org
>
>
> The "State" being ":disallowed" usually indicates that the worker's
> executors' slot assignment has changed.  If you look at the Nimbus logs, I
> would expect to see logs stating that the affected executors were noticed
> to be "not alive" by the Nimbus, and then an immediate "Setting new
> assignment for topology id" log line as the Nimbus creates a new assignment
> for this topology.  As for why the Worker itself died -- is there no
> indication in the worker logs?
>
> - Erik
>
> On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:
>
> I checked my supervisor log, and found logs like  "Shutting down and
> clearing state for id {worker.id}". But still can't find the cause of
> worker shutting down.
>
> Here are the logs in supervisor. It seems that worker's jvm exit before
> supervisor kill it.
>
> 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "message_storm-101-17-1430903865", :executors ...} for this supervisor
> 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
> 95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with
> command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g'
> '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m'
> '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4'
> '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
> '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
> '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails'
> '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime'
> '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log'
> '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
> '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
> '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
> '-Dstorm.id=message_storm-101-17-1430903865'
> '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701'
> '-cp'
> '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
> 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
> '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
> '95722030-7785-40d6-a47a-cc072c4c9d83'
> 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
> 1430904337. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305,
> :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
>
>
>
>
> ------------------------------
> From: JMaass@cccis.com
> To: user@storm.apache.org
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> Date: Thu, 7 May 2015 15:15:50 +0000
>
>
> Look into your supervisor logs, with them set to INFO.
>
>  Look for something like:
> “Shutting down and clearing state for id {worker.id}”
>
>  That id will match your worker’s logs.
>
>  The reason for the shutdown will be in the supervisor’s logs.
>
>
>
>
>   From: SuXingyu <xi...@outlook.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: 2015,Thursday, May 7 at 02:44
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Subject: Storm worker died - Client is being closed, and does not take
> requests any more
>
>   Hi all,
>
> I am runing a Kafka-Storm cluster in distributed mode. There are four
> supervisors in my storm cluster. For most of the time, it works fun. But
> after I redeployed topology yesterday, I found all the four workers in
> storm cluster restart after running several minutes.
>
> I check the log in storm and find this:
>
> 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:185)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.3.jar:0.9.3]
>         ... 6 common frames omitted
> 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop
> died!")
> java.lang.RuntimeException: ("Async loop died!")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
> [clojure-1.5.1.jar:na]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
> [storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
> message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
> 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> These logs appear several times in about twenty minutes. And the cluter
> restored after a reblance operation in Storm ui.
>
> Any idea why this may be happening?
>
> Best regard!
>
> Xingyu
>
>
>
>
>
>
>


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

RE: Storm worker died - Client is being closed, and does not take requests any more

Posted by SuXingyu <xi...@outlook.com>.
I am running my cluster using Storm 0.9.3.  This exception was caused by sending msg while Netty client is closing. But still don't know why Netty client closed.

 From: eweathers@groupon.com
Date: Thu, 7 May 2015 23:34:16 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
To: user@storm.apache.org

What version of Storm are you running?   I had significant issues with Storm v0.9.3 having what I'll term "congestive collapse" related to the Netty implementation.  Workers would continually die.  The problem was resolved by switching the transport layer from Netty back to the original ZeroMQ implementation, but that was not very trivial.
- Erik
On Thu, May 7, 2015 at 11:28 PM, SuXingyu <xi...@outlook.com> wrote:



Yes, I've found the logs you metioned in Nimbus log. But still can't find any indication in worker logs. The first log genarated by storm after work start is "Async loop died".

From: eweathers@groupon.com
Date: Thu, 7 May 2015 22:56:41 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
To: user@storm.apache.org

The "State" being ":disallowed" usually indicates that the worker's executors' slot assignment has changed.  If you look at the Nimbus logs, I would expect to see logs stating that the affected executors were noticed to be "not alive" by the Nimbus, and then an immediate "Setting new assignment for topology id" log line as the Nimbus creates a new assignment for this topology.  As for why the Worker itself died -- is there no indication in the worker logs?
- Erik
On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:



I checked my supervisor log, and found logs like  "Shutting down and clearing state for id {worker.id}". But still can't find the cause of worker shutting down.

Here are the logs in supervisor. It seems that worker's jvm exit before supervisor kill it.

2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "message_storm-101-17-1430903865", :executors ...} for this supervisor 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id 95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log' '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' '-Dstorm.id=message_storm-101-17-1430903865' '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp' '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar' 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' '95722030-7785-40d6-a47a-cc072c4c9d83'
2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: 1430904337. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down 
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83




From: JMaass@cccis.com
To: user@storm.apache.org
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
Date: Thu, 7 May 2015 15:15:50 +0000






Look into your supervisor logs, with them set to INFO.



Look for something like:
“Shutting down and clearing state for id {worker.id}”



That id will match your worker’s logs.



The reason for the shutdown will be in the supervisor’s logs.














From: SuXingyu <xi...@outlook.com>

Reply-To: "user@storm.apache.org" <us...@storm.apache.org>

Date: 2015,Thursday, May 7 at 02:44

To: "user@storm.apache.org" <us...@storm.apache.org>

Subject: Storm worker died - Client is being closed, and does not take requests any more







Hi all,



I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several
 minutes.



I check the log in storm and find this:



2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!

java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3]

        ... 6 common frames omitted

2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")

java.lang.RuntimeException: ("Async loop died!")

        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700

2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT



These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui.



Any idea why this may be happening?



Best regard!



Xingyu












 		 	   		  

 		 	   		  

 		 	   		  

Re: Storm worker died - Client is being closed, and does not take requests any more

Posted by Erik Weathers <ew...@groupon.com>.
What version of Storm are you running?   I had significant issues with
Storm v0.9.3 having what I'll term "congestive collapse" related to the
Netty implementation.  Workers would continually die.  The problem was
resolved by switching the transport layer from Netty back to the original
ZeroMQ implementation, but that was not very trivial.

- Erik

On Thu, May 7, 2015 at 11:28 PM, SuXingyu <xi...@outlook.com> wrote:

> Yes, I've found the logs you metioned in Nimbus log. But still can't find
> any indication in worker logs. The first log genarated by storm after work
> start is "Async loop died".
>
> ------------------------------
> From: eweathers@groupon.com
> Date: Thu, 7 May 2015 22:56:41 -0700
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: user@storm.apache.org
>
>
> The "State" being ":disallowed" usually indicates that the worker's
> executors' slot assignment has changed.  If you look at the Nimbus logs, I
> would expect to see logs stating that the affected executors were noticed
> to be "not alive" by the Nimbus, and then an immediate "Setting new
> assignment for topology id" log line as the Nimbus creates a new assignment
> for this topology.  As for why the Worker itself died -- is there no
> indication in the worker logs?
>
> - Erik
>
> On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:
>
> I checked my supervisor log, and found logs like  "Shutting down and
> clearing state for id {worker.id}". But still can't find the cause of
> worker shutting down.
>
> Here are the logs in supervisor. It seems that worker's jvm exit before
> supervisor kill it.
>
> 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "message_storm-101-17-1430903865", :executors ...} for this supervisor
> 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
> 95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with
> command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g'
> '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m'
> '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4'
> '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
> '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
> '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails'
> '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime'
> '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log'
> '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
> '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
> '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
> '-Dstorm.id=message_storm-101-17-1430903865'
> '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701'
> '-cp'
> '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
> 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
> '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
> '95722030-7785-40d6-a47a-cc072c4c9d83'
> 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
> 1430904337. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305,
> :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
>
>
>
>
> ------------------------------
> From: JMaass@cccis.com
> To: user@storm.apache.org
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> Date: Thu, 7 May 2015 15:15:50 +0000
>
>
> Look into your supervisor logs, with them set to INFO.
>
>  Look for something like:
> “Shutting down and clearing state for id {worker.id}”
>
>  That id will match your worker’s logs.
>
>  The reason for the shutdown will be in the supervisor’s logs.
>
>
>
>
>   From: SuXingyu <xi...@outlook.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: 2015,Thursday, May 7 at 02:44
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Subject: Storm worker died - Client is being closed, and does not take
> requests any more
>
>   Hi all,
>
> I am runing a Kafka-Storm cluster in distributed mode. There are four
> supervisors in my storm cluster. For most of the time, it works fun. But
> after I redeployed topology yesterday, I found all the four workers in
> storm cluster restart after running several minutes.
>
> I check the log in storm and find this:
>
> 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:185)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.3.jar:0.9.3]
>         ... 6 common frames omitted
> 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop
> died!")
> java.lang.RuntimeException: ("Async loop died!")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
> [clojure-1.5.1.jar:na]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
> [storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
> message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
> 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> These logs appear several times in about twenty minutes. And the cluter
> restored after a reblance operation in Storm ui.
>
> Any idea why this may be happening?
>
> Best regard!
>
> Xingyu
>
>
>
>
>
>

RE: Storm worker died - Client is being closed, and does not take requests any more

Posted by SuXingyu <xi...@outlook.com>.
Yes, I've found the logs you metioned in Nimbus log. But still can't find any indication in worker logs. The first log genarated by storm after work start is "Async loop died".

From: eweathers@groupon.com
Date: Thu, 7 May 2015 22:56:41 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
To: user@storm.apache.org

The "State" being ":disallowed" usually indicates that the worker's executors' slot assignment has changed.  If you look at the Nimbus logs, I would expect to see logs stating that the affected executors were noticed to be "not alive" by the Nimbus, and then an immediate "Setting new assignment for topology id" log line as the Nimbus creates a new assignment for this topology.  As for why the Worker itself died -- is there no indication in the worker logs?
- Erik
On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:



I checked my supervisor log, and found logs like  "Shutting down and clearing state for id {worker.id}". But still can't find the cause of worker shutting down.

Here are the logs in supervisor. It seems that worker's jvm exit before supervisor kill it.

2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "message_storm-101-17-1430903865", :executors ...} for this supervisor 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id 95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log' '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' '-Dstorm.id=message_storm-101-17-1430903865' '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp' '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar' 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' '95722030-7785-40d6-a47a-cc072c4c9d83'
2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: 1430904337. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down 
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83




From: JMaass@cccis.com
To: user@storm.apache.org
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
Date: Thu, 7 May 2015 15:15:50 +0000






Look into your supervisor logs, with them set to INFO.



Look for something like:
“Shutting down and clearing state for id {worker.id}”



That id will match your worker’s logs.



The reason for the shutdown will be in the supervisor’s logs.














From: SuXingyu <xi...@outlook.com>

Reply-To: "user@storm.apache.org" <us...@storm.apache.org>

Date: 2015,Thursday, May 7 at 02:44

To: "user@storm.apache.org" <us...@storm.apache.org>

Subject: Storm worker died - Client is being closed, and does not take requests any more







Hi all,



I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several
 minutes.



I check the log in storm and find this:



2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!

java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3]

        ... 6 common frames omitted

2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")

java.lang.RuntimeException: ("Async loop died!")

        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700

2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT



These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui.



Any idea why this may be happening?



Best regard!



Xingyu












 		 	   		  

 		 	   		  

Re: Storm worker died - Client is being closed, and does not take requests any more

Posted by Erik Weathers <ew...@groupon.com>.
The "State" being ":disallowed" usually indicates that the worker's
executors' slot assignment has changed.  If you look at the Nimbus logs, I
would expect to see logs stating that the affected executors were noticed
to be "not alive" by the Nimbus, and then an immediate "Setting new
assignment for topology id" log line as the Nimbus creates a new assignment
for this topology.  As for why the Worker itself died -- is there no
indication in the worker logs?

- Erik

On Thu, May 7, 2015 at 8:36 PM, SuXingyu <xi...@outlook.com> wrote:

> I checked my supervisor log, and found logs like  "Shutting down and
> clearing state for id {worker.id}". But still can't find the cause of
> worker shutting down.
>
> Here are the logs in supervisor. It seems that worker's jvm exit before
> supervisor kill it.
>
> 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "message_storm-101-17-1430903865", :executors ...} for this supervisor
> 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
> 95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with
> command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g'
> '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m'
> '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4'
> '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
> '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
> '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails'
> '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime'
> '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log'
> '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
> '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
> '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
> '-Dstorm.id=message_storm-101-17-1430903865'
> '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701'
> '-cp'
> '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
> 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
> '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
> '95722030-7785-40d6-a47a-cc072c4c9d83'
> 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
> 1430904337. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305,
> :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
>
>
>
>
> ------------------------------
> From: JMaass@cccis.com
> To: user@storm.apache.org
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> Date: Thu, 7 May 2015 15:15:50 +0000
>
>
> Look into your supervisor logs, with them set to INFO.
>
>  Look for something like:
> “Shutting down and clearing state for id {worker.id}”
>
>  That id will match your worker’s logs.
>
>  The reason for the shutdown will be in the supervisor’s logs.
>
>
>
>
>   From: SuXingyu <xi...@outlook.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: 2015,Thursday, May 7 at 02:44
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Subject: Storm worker died - Client is being closed, and does not take
> requests any more
>
>   Hi all,
>
> I am runing a Kafka-Storm cluster in distributed mode. There are four
> supervisors in my storm cluster. For most of the time, it works fun. But
> after I redeployed topology yesterday, I found all the four workers in
> storm cluster restart after running several minutes.
>
> I check the log in storm and find this:
>
> 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:185)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.3.jar:0.9.3]
>         ... 6 common frames omitted
> 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop
> died!")
> java.lang.RuntimeException: ("Async loop died!")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
> [clojure-1.5.1.jar:na]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
> [storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
> message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
> 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> These logs appear several times in about twenty minutes. And the cluter
> restored after a reblance operation in Storm ui.
>
> Any idea why this may be happening?
>
> Best regard!
>
> Xingyu
>
>
>
>
>

RE: Storm worker died - Client is being closed, and does not take requests any more

Posted by SuXingyu <xi...@outlook.com>.
I checked my supervisor log, and found logs like  "Shutting down and clearing state for id {worker.id}". But still can't find the cause of worker shutting down.

Here are the logs in supervisor. It seems that worker's jvm exit before supervisor kill it.

2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "message_storm-101-17-1430903865", :executors ...} for this supervisor 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id 95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log' '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' '-Dstorm.id=message_storm-101-17-1430903865' '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp' '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar' 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' '95722030-7785-40d6-a47a-cc072c4c9d83'
2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: 1430904337. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down 
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead.
2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83




From: JMaass@cccis.com
To: user@storm.apache.org
Subject: Re: Storm worker died - Client is being closed, and does not take requests any more
Date: Thu, 7 May 2015 15:15:50 +0000






Look into your supervisor logs, with them set to INFO.



Look for something like:
“Shutting down and clearing state for id {worker.id}”



That id will match your worker’s logs.



The reason for the shutdown will be in the supervisor’s logs.














From: SuXingyu <xi...@outlook.com>

Reply-To: "user@storm.apache.org" <us...@storm.apache.org>

Date: 2015,Thursday, May 7 at 02:44

To: "user@storm.apache.org" <us...@storm.apache.org>

Subject: Storm worker died - Client is being closed, and does not take requests any more







Hi all,



I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several
 minutes.



I check the log in storm and find this:



2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!

java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more

        at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3]

        ... 6 common frames omitted

2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")

java.lang.RuntimeException: ("Async loop died!")

        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]

        at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700

2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT



These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui.



Any idea why this may be happening?



Best regard!



Xingyu












 		 	   		  

Re: Storm worker died - Client is being closed, and does not take requests any more

Posted by Jeff Maass <JM...@cccis.com>.
Look into your supervisor logs, with them set to INFO.

Look for something like:
“Shutting down and clearing state for id {worker.id}”

That id will match your worker’s logs.

The reason for the shutdown will be in the supervisor’s logs.




From: SuXingyu <xi...@outlook.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: 2015,Thursday, May 7 at 02:44
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Storm worker died - Client is being closed, and does not take requests any more

Hi all,

I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several minutes.

I check the log in storm and find this:

2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3]
        ... 6 common frames omitted
2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")
java.lang.RuntimeException: ("Async loop died!")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui.

Any idea why this may be happening?

Best regard!

Xingyu