You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Rushabh Shah <ru...@integral.com> on 2014/08/04 20:37:07 UTC

Storm workers not starting because of netty reconnect : [INFO] Reconnect started for Netty-Client

Hi,
I have a topology that was deployed on a storm cluster and was running fine until I started facing the following issue.

I can see that in supervisor logs, the supervisor is trying to launch the topology on a worker but it is not able to start it.



2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] [1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6702 with id f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with command: '/integral/opt/jdk16/bin/java' '-server' '-Xmx768m' '-Djava.net.preferIPv4Stack=true' '-Djava.net.preferIPv4Stack=true' '-Xmanagement:ssl=false,authenticate=false,port=7099' '-Xmx8192m' '-Djava.library.path=/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources/Linux-amd64:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6702.log' '-Dstorm.home=/integral/opt/apache-storm-0.9.2-incubating' '-Dlogback.configurationFile=/integral/opt/apache-storm-0.9.2-incubating/logback/cluster.xml' '-Dstorm.id=SALABPOSITION-5-1-2-1406938773' '-Dworker.id=f620ab27-61fd-4b87-b017-dea1e811074b' '-Dworker.port=6702' '-cp' '/integral/opt/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/integral/opt/apache-storm-0.9.2-incubating/conf:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/stormjar.jar' 'backtype.storm.daemon.worker' 'SALABPOSITION-5-1-2-1406938773' '48753f4c-e0fd-48f3-a149-1f52491da5b9' '6702' 'f620ab27-61fd-4b87-b017-dea1e811074b'
2014-08-04 18:27:33 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:33 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:34 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
.....

After 120 seconds the supervisor will timeout and try to start the topology on another worker.


2014-08-04 18:29:32 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:32 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Worker f620ab27-61fd-4b87-b017-dea1e811074b failed to start
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down and clearing state for id f620ab27-61fd-4b87-b017-dea1e811074b. Current supervisor time: 1407176973. State: :not-started, Heartbeat: nil
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down 48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shut down 48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] [1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6703 with id c290b2ec-7969-44ca-ac3e-008b8841ef3f


And this process keeps on repeating.


On the worker logs, I see the following :


2014-08-04 08:09:53 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [14]
2014-08-04 08:09:54 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [15]
2014-08-04 08:09:55 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [16]
......
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-supervisor2.integral.com/192.168.239.166:6703
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-supervisor2.integral.com/192.168.239.166:6703..., timeout: 600000ms, pendings: 0
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6701... [0]
2014-08-04 08:10:10 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:194) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:323) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 6 common frames omitted
2014-08-04 08:10:10 b.s.util [INFO] Halting process: ("Async loop died!")


It seems that the supervisor is not able to communicate with the workers because of some netty connection issues.

I would appreciate if somebody can help me in this regard.

Thanks,
Rushabh




RE: Storm workers not starting because of netty reconnect : [INFO] Reconnect started for Netty-Client

Posted by Rushabh Shah <ru...@integral.com>.
I deleted my storm local dir and zookeeper data/ log dirs and restarted zookeeper and storm again and it works fine now.

This is still very weird and I am still investigating the root cause.

Thanks,
Rushabh

From: P. Taylor Goetz [mailto:ptgoetz@gmail.com]
Sent: Tuesday, August 05, 2014 10:57 AM
To: user@storm.incubator.apache.org
Subject: Re: Storm workers not starting because of netty reconnect : [INFO] Reconnect started for Netty-Client

I would double check to make sure hostname resolution is working properly on all hosts in the cluster, and that there are not any firewall rules that would prevent connections on the supervisor ports.

I would also remove any Netty configuration overrides from storm.yaml to allow the defaults to take effect - only override the defaults when you need to.

- Taylor


On Aug 4, 2014, at 2:37 PM, Rushabh Shah <ru...@integral.com>> wrote:


Hi,
I have a topology that was deployed on a storm cluster and was running fine until I started facing the following issue.

I can see that in supervisor logs, the supervisor is trying to launch the topology on a worker but it is not able to start it.



2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] [1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6702 with id f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with command: '/integral/opt/jdk16/bin/java' '-server' '-Xmx768m' '-Djava.net.preferIPv4Stack=true' '-Djava.net.preferIPv4Stack=true' '-Xmanagement:ssl=false,authenticate=false,port=7099' '-Xmx8192m' '-Djava.library.path=/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources/Linux-amd64:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6702.log' '-Dstorm.home=/integral/opt/apache-storm-0.9.2-incubating' '-Dlogback.configurationFile=/integral/opt/apache-storm-0.9.2-incubating/logback/cluster.xml' '-Dstorm.id=SALABPOSITION-5-1-2-1406938773' '-Dworker.id=f620ab27-61fd-4b87-b017-dea1e811074b' '-Dworker.port=6702' '-cp' '/integral/opt/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/integral/opt/apache-storm-0.9.2-incubating/conf:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/stormjar.jar' 'backtype.storm.daemon.worker' 'SALABPOSITION-5-1-2-1406938773' '48753f4c-e0fd-48f3-a149-1f52491da5b9' '6702' 'f620ab27-61fd-4b87-b017-dea1e811074b'
2014-08-04 18:27:33 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:33 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:34 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
.....

After 120 seconds the supervisor will timeout and try to start the topology on another worker.


2014-08-04 18:29:32 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:32 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Worker f620ab27-61fd-4b87-b017-dea1e811074b failed to start
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down and clearing state for id f620ab27-61fd-4b87-b017-dea1e811074b. Current supervisor time: 1407176973. State: :not-started, Heartbeat: nil
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down 48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shut down 48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] [1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6703 with id c290b2ec-7969-44ca-ac3e-008b8841ef3f


And this process keeps on repeating.


On the worker logs, I see the following :


2014-08-04 08:09:53 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>... [14]
2014-08-04 08:09:54 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>... [15]
2014-08-04 08:09:55 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>... [16]
......
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>..., timeout: 600000ms, pendings: 0
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6701<http://netty-client-supervisor2.integral.com/192.168.239.166:6701>... [0]
2014-08-04 08:10:10 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:194) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:323) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 6 common frames omitted
2014-08-04 08:10:10 b.s.util [INFO] Halting process: ("Async loop died!")


It seems that the supervisor is not able to communicate with the workers because of some netty connection issues.

I would appreciate if somebody can help me in this regard.

Thanks,
Rushabh


Re: Storm workers not starting because of netty reconnect : [INFO] Reconnect started for Netty-Client

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
I would double check to make sure hostname resolution is working properly on all hosts in the cluster, and that there are not any firewall rules that would prevent connections on the supervisor ports.

I would also remove any Netty configuration overrides from storm.yaml to allow the defaults to take effect — only override the defaults when you need to.

- Taylor


On Aug 4, 2014, at 2:37 PM, Rushabh Shah <ru...@integral.com> wrote:

> Hi,
> I have a topology that was deployed on a storm cluster and was running fine until I started facing the following issue.
>  
> I can see that in supervisor logs, the supervisor is trying to launch the topology on a worker but it is not able to start it.
>  
>  
>  
> 2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] [1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6702 with id f620ab27-61fd-4b87-b017-dea1e811074b
> 2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with command: '/integral/opt/jdk16/bin/java' '-server' '-Xmx768m' '-Djava.net.preferIPv4Stack=true' '-Djava.net.preferIPv4Stack=true' '-Xmanagement:ssl=false,authenticate=false,port=7099' '-Xmx8192m' '-Djava.library.path=/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources/Linux-amd64:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6702.log' '-Dstorm.home=/integral/opt/apache-storm-0.9.2-incubating' '-Dlogback.configurationFile=/integral/opt/apache-storm-0.9.2-incubating/logback/cluster.xml' '-Dstorm.id=SALABPOSITION-5-1-2-1406938773' '-Dworker.id=f620ab27-61fd-4b87-b017-dea1e811074b' '-Dworker.port=6702' '-cp' '/integral/opt/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/integral/opt/apache-storm-0.9.2-incubating/conf:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/stormjar.jar' 'backtype.storm.daemon.worker' 'SALABPOSITION-5-1-2-1406938773' '48753f4c-e0fd-48f3-a149-1f52491da5b9' '6702' 'f620ab27-61fd-4b87-b017-dea1e811074b'
> 2014-08-04 18:27:33 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
> 2014-08-04 18:27:33 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
> 2014-08-04 18:27:34 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
> …..
>  
> After 120 seconds the supervisor will timeout and try to start the topology on another worker.
>  
>  
> 2014-08-04 18:29:32 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
> 2014-08-04 18:29:32 b.s.d.supervisor [INFO] f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
> 2014-08-04 18:29:33 b.s.d.supervisor [INFO] Worker f620ab27-61fd-4b87-b017-dea1e811074b failed to start
> 2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down and clearing state for id f620ab27-61fd-4b87-b017-dea1e811074b. Current supervisor time: 1407176973. State: :not-started, Heartbeat: nil
> 2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down 48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
> 2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shut down 48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
> 2014-08-04 18:29:33 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] [1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6703 with id c290b2ec-7969-44ca-ac3e-008b8841ef3f
>  
>  
> And this process keeps on repeating.
>  
>  
> On the worker logs, I see the following :
>  
>  
> 2014-08-04 08:09:53 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [14]
> 2014-08-04 08:09:54 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [15]
> 2014-08-04 08:09:55 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [16]
> ……
> 2014-08-04 08:10:10 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-supervisor2.integral.com/192.168.239.166:6703
> 2014-08-04 08:10:10 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-supervisor2.integral.com/192.168.239.166:6703..., timeout: 600000ms, pendings: 0
> 2014-08-04 08:10:10 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-supervisor2.integral.com/192.168.239.166:6701... [0]
> 2014-08-04 08:10:10 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
>         at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:194) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:323) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         ... 6 common frames omitted
> 2014-08-04 08:10:10 b.s.util [INFO] Halting process: ("Async loop died!")
>  
>  
> It seems that the supervisor is not able to communicate with the workers because of some netty connection issues.
>  
> I would appreciate if somebody can help me in this regard.
>  
> Thanks,
> Rushabh