You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Daniel Schonfeld <do...@gmail.com> on 2015/02/19 15:51:40 UTC
Client is being closed, and does not take requests any more

Hello,

We're seeing this weird occurrence where seemingly out of no where one of
the Netty clients inside a worker spits out the following exception:

2015-02-19 07:12:02 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
The baseSleepTimeMs [1000] the maxSleepTimeMs [5000] the maxRetries
[10]
2015-02-19 07:14:51 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is
being closed, and does not take requests any more
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
~[storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: java.lang.RuntimeException: Client is being closed, and
does not take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:185)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.9.3.jar:0.9.3]
        ... 6 common frames omitted

The invocation of StormBoundedExponentialBackoffRetry causes me to think
whether there might be a race condition inside worker.clj between
mk-refresh-connections and the functions handler created in
mk-transfer-tuples-handler where between the time of actually swapping out
of the hash map the old connection and calling close() on them, there might
be a situation where a batch of messages are being sent off causing the
worker to crash.

If so, that makes Storm-414 and subsequently Strom-329 even worse.

Any thoughts about this?

Thanks!
Daniel Schonfeld