You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Prasanna Padmanabhan <pr...@gmail.com> on 2014/04/17 09:20:43 UTC

Netty Reconnects causing worker processes to die

I understand there all several threads on this topic, but wasn't able to
find a definitive solution and so here I go.

I'm using Apache Storm 0.9.1-incubating. I'm also using Trident to fetch
messages from 0.8 kafka cluster using net.wurstmeister.storm:storm-kafka
-0.8-plus:0.4.0.
The cluster when run on local mode works well being able to retrieve
messages from kafka and does the data processing. When run on remote
cluster mode using JZMQ, it works pretty well too. However, I'm unable to
get it to work using Netty. I have been running with the following configs
for Netty.

storm.messaging.netty.buffer_size       5242880
storm.messaging.netty.client_worker_threads     1
storm.messaging.netty.max_retries       5
storm.messaging.netty.max_wait_ms       1000
storm.messaging.netty.min_wait_ms       100
storm.messaging.netty.server_worker_threads     1
storm.messaging.transport
backtype.storm.messaging.netty.Context

Here is the exception I see in the worker node:

2014-04-17 07:01:03 b.s.d.executor [INFO] Processing received message
source: __system:-1, stream: __tick, id: {}, [5]
2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [1]
2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [2]
2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [3]
2014-04-17 07:01:05 b.s.m.n.Client [INFO] Reconnect ... [4]
*2014-04-17 07:01:06 b.s.m.n.Client [INFO] Reconnect ... [5]*
*2014-04-17 07:01:06 b.s.m.n.Client [WARN] Remote address is not reachable.
We will close this client.*
.
.
*2014-04-17 07:01:23 b.s.util [ERROR] Async loop died!*
*java.lang.RuntimeException: java.lang.RuntimeException: Client is being
closed, and does not take requests any more*
        at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:107)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
        at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:78)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
        at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:77)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
        at
backtype.storm.disruptor$consume_loop_STAR_$fn__6954.invoke(disruptor.clj:89)
~[na:na]
        at backtype.storm.util$async_loop$fn__5761.invoke(util.clj:433)
~[na:na]
        at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
        at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.lang.RuntimeException: Client is being closed, and does not
take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:125)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
        at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__9966$fn__9967.invoke(worker.clj:319)
~[na:na]
        at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__9966.invoke(worker.clj:308)
~[na:na]
        at
backtype.storm.disruptor$clojure_handler$reify__6937.onEvent(disruptor.clj:58)
~[na:na]
        at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:104)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
        ... 6 common frames omitted
2014-04-17 07:01:23 b.s.util [INFO] Halting process: ("Async loop died!")

*Things I have already tried with no luck:*
- The connectivity to Zookeeper is good. I cleared off all the states
before a run and can see Zookeeper updated with the right storms,
supervisor states etc.
- The topology names and stream names are unique
- Tried a lower value for storm.messaging.netty.max_retries = 5
- Nimbus thrift port 6627 is open and accessible.

*Question:*
Where does Netty try to connect for me to get "*b.s.m.n.Client [WARN]
Remote address is not reachable. We will close this client."*

I would greatly appreciate if you folks can point me to the right direction.

Thanks,
Prasanna

Re: Netty Reconnects causing worker processes to die

Posted by Prasanna Padmanabhan <pr...@gmail.com>.
Turns out all we had to do is to increase storm.messaging.netty.min_wait_ms
and storm.messaging.netty.max_wait_ms in our AWS instances.

Thanks,
Prasanna


On Thu, Apr 17, 2014 at 12:20 AM, Prasanna Padmanabhan
<pr...@gmail.com>wrote:

> I understand there all several threads on this topic, but wasn't able to
> find a definitive solution and so here I go.
>
> I'm using Apache Storm 0.9.1-incubating. I'm also using Trident to fetch
> messages from 0.8 kafka cluster using net.wurstmeister.storm:storm-kafka
> -0.8-plus:0.4.0.
> The cluster when run on local mode works well being able to retrieve
> messages from kafka and does the data processing. When run on remote
> cluster mode using JZMQ, it works pretty well too. However, I'm unable to
> get it to work using Netty. I have been running with the following configs
> for Netty.
>
> storm.messaging.netty.buffer_size       5242880
> storm.messaging.netty.client_worker_threads     1
> storm.messaging.netty.max_retries       5
> storm.messaging.netty.max_wait_ms       1000
> storm.messaging.netty.min_wait_ms       100
> storm.messaging.netty.server_worker_threads     1
> storm.messaging.transport
> backtype.storm.messaging.netty.Context
>
> Here is the exception I see in the worker node:
>
> 2014-04-17 07:01:03 b.s.d.executor [INFO] Processing received message
> source: __system:-1, stream: __tick, id: {}, [5]
> 2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [1]
> 2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [2]
> 2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [3]
> 2014-04-17 07:01:05 b.s.m.n.Client [INFO] Reconnect ... [4]
> *2014-04-17 07:01:06 b.s.m.n.Client [INFO] Reconnect ... [5]*
> *2014-04-17 07:01:06 b.s.m.n.Client [WARN] Remote address is not
> reachable. We will close this client.*
> .
> .
> *2014-04-17 07:01:23 b.s.util [ERROR] Async loop died!*
> *java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more*
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:107)
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:78)
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:77)
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__6954.invoke(disruptor.clj:89)
> ~[na:na]
>         at backtype.storm.util$async_loop$fn__5761.invoke(util.clj:433)
> ~[na:na]
>         at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
>         at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:125)
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__9966$fn__9967.invoke(worker.clj:319)
> ~[na:na]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__9966.invoke(worker.clj:308)
> ~[na:na]
>         at
> backtype.storm.disruptor$clojure_handler$reify__6937.onEvent(disruptor.clj:58)
> ~[na:na]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:104)
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         ... 6 common frames omitted
> 2014-04-17 07:01:23 b.s.util [INFO] Halting process: ("Async loop died!")
>
> *Things I have already tried with no luck:*
> - The connectivity to Zookeeper is good. I cleared off all the states
> before a run and can see Zookeeper updated with the right storms,
> supervisor states etc.
> - The topology names and stream names are unique
> - Tried a lower value for storm.messaging.netty.max_retries = 5
> - Nimbus thrift port 6627 is open and accessible.
>
> *Question:*
> Where does Netty try to connect for me to get "*b.s.m.n.Client [WARN]
> Remote address is not reachable. We will close this client."*
>
> I would greatly appreciate if you folks can point me to the right
> direction.
>
> Thanks,
> Prasanna
>