You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 02:51:27 UTC

[jira] [Updated] (STORM-450) Netty can cause error on clean shutdown of worker

     [ https://issues.apache.org/jira/browse/STORM-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-450:
-------------------------------
    Component/s: storm-core

> Netty can cause error on clean shutdown of worker
> -------------------------------------------------
>
>                 Key: STORM-450
>                 URL: https://issues.apache.org/jira/browse/STORM-450
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.9.2-incubating, 0.9.0.1, 0.9.3
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> We recently had an issue where a worker process was shutdown cleaning on 0.9.0.  The reason the worker shutdown cleanly is not the issue here, but it caused a cascading failure that made a connected worker shutdown too.  This is going to be even more problematic in newer versions of storm when we give the worker time to shutdown cleanly instead of just shooting it with a kill -9
> Ideally the client should continue to try and reconnect, because the worker may have exited on its own and will be re-spawned shortly.  If it is rescheduled elsewhere the worker will eventually detect it and reroute things accordingly.  This is what happens already when the connection is just closed.  There really is no reason to have one side know when the other side is shutting down.  
> {code}
> 2014-08-11 19:00:17 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
> 	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:130) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:101) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.disruptor$consume_loop_STAR_$fn__1999.invoke(disruptor.clj:74) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.util$async_loop$fn__421.invoke(util.clj:400) ~[storm-core-0.9.0-wip21.jar:na]
> 	at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
> 	at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]
> Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more
> 	at backtype.storm.messaging.netty.Client.send(Client.java:118) ~[storm-netty-0.9.0-wip21.jar:na]
> 	at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922$fn__4923.invoke(worker.clj:342) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922.invoke(worker.clj:331) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.disruptor$clojure_handler$reify__1986.onEvent(disruptor.clj:43) ~[storm-core-0.9.0-wip21.jar:na]
> 	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:127) ~[storm-core-0.9.0-wip21.jar:na]
> 	... 6 common frames omitted
> 2014-08-11 19:00:17 b.s.util [INFO] Halting process: ("Async loop died!")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)