You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by 李家宏 <jh...@gmail.com> on 2014/04/15 05:38:21 UTC

Repost: [storm-netty]-"too many open files exceptions"

Hi, all
I'm running a topology on storm cluster of 0.9.0.1 with netty as transport
layer, this error occurs :
Netty client failed to create a selector due to* too many open files
exception*, the worker continuously halting with initialization error.

I checked the ulimit -n(> 130000) which is much bigger than currently
opened fds (sudo lsof | grep java | wc -l) which is about 6000 at most.

By the way,this topology works fine with storm cluster of 0.8.0.

What's the problem?

here is the stack trace:
-------------------------------------------------------------
2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
plugin:backtype.storm.messaging.netty.Context
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
server mk-worker
   org.jboss.netty.channel.ChannelException: Failed to create a selector.
   at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:337)
   ~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelector.java:95)
~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.java:51)
~[netty-3.6.3.Final.jar:na]
   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:99)
   ~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:69)
   ~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)
~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33)
~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClientSocketChannelFactory.java:152)
   ~[netty-3.6.3.Final.jar:na]
   at
org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClientSocketChannelFactory.java:134)
   ~[netty-3.6.3.Final.jar:na]
   at backtype.storm.messaging.netty.Client.(Client.java:54)
~[storm-netty-0.9.0.1.jar:na]
   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
~[storm-netty-0.9.0.1.jar:na]
   at
backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__5834__5838$fn__5839.invoke(worker.clj:250)
   ~[storm-core-0.9.0.1.jar:na]
   at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na]
   at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na]
   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
   at
backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(worker.clj:244)
~[storm-core-0.9.0.1.jar:na]
   at
backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invoke(worker.clj:357)
   ~[storm-core-0.9.0.1.jar:na]
   at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.4.0.jar:na]
   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
   at
backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.clj:329)
[storm-core-0.9.0.1.jar:na]
   at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.4.0.jar:na]
   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
[storm-core-0.9.0.1.jar:na]
   at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.4.0.jar:na]
   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
   at backtype.storm.daemon.worker.main(Unknown Source)
[storm-core-0.9.0.1.jar:na]

  * Caused by: java.io.IOException: Too many open files*

   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
~[na:1.6.0_38]
   at
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
~[na:1.6.0_38]
   at java.nio.channels.Selector.open(Selector.java:209) ~[na:1.6.0_38]
   at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:335)
   ~[netty-3.6.3.Final.jar:na]
   ... 32 common frames omitted
   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
initialization")
--------------------------------------------------------------------------------------------------------------------

Thanks

-- 

======================================================

Gvain

Email: jh.li.em@gmail.com

Re: Repost: [storm-netty]-"too many open files exceptions"

Posted by Bobby Evans <ev...@yahoo-inc.com>.
Yes and no.  Storm establishes the connections based off of the compiled
topology so even though in theory after about 360 workers the ports would
be exhausted in practice it is a bit harder to do.  However, that being
said it is still possible to have this happen.  For example if you had a
topology with 800 workers, 400 spouts, 400 bolts and a shuffle grouping
between the two you would probably run into this problem. The only real
way to avoid this is to have your topology not create a fully connected
graph. We could try to make Netty really lazy about establishing the
actual connection, and have the option of tearing down unused connections,
but that would only work for groupings that have a skewed access pattern,
shuffle tries very hard to make it even.  It would also slow down the
topology a lot, potentially.

If this is an issue you are running into there are things we can try to
look at.

—Bobby


On 4/16/14, 11:58 PM, "李家宏" <jh...@gmail.com> wrote:

>hi , evans
>
>I tried out the latest version of storm, it uses a shared threadpool which
>is non-blocking for every netty-client and thus reduced large number of
>threads, as well as pipes. And for now, the "too many open file
>exceptions"
>is never thrown.
>
>One more thing:
> To my knowledge, as worker number increases, the number of tcp port used
>per worker increases largely, and the max tcp port usage per worker is
>twice the number of workers. What's more, one machine will host several
>workers, the total tcp port usage per machine would be multiplied, and
>thus
>will exhaust tcp ports(less than 65536) of the machine.
>
>Thanks for your advice.
>
>
>2014-04-16 10:36 GMT+08:00 李家宏 <jh...@gmail.com>:
>
>> ​Although you reduced the Selector instances, netty still leaks open
>>file
>> descriptors. As topology expands much larger, the "too many open files
>> exception" will inevitably throw.
>>
>>
>> 2014-04-16 0:17 GMT+08:00 Bobby Evans <ev...@yahoo-inc.com>:
>>
>> I am rather stumped here. The code is blowing up creating a pipe as part
>>> of an nio EpollSelector for netty to use.  My best advice right now is
>>>to
>>> try and upgrade to the latest version of storm.  We have merged in two
>>> fixes, one that relates to closing config files, and one that relates
>>>to
>>> netty.  The fix makes it so that it uses less threads, but as a part of
>>> that I believe that the number of Selector instances will be smaller
>>>too,
>>> although this stake trace is for the client side, not the server side.
>>>
>>> ―Bobby
>>>
>>> On 4/14/14, 10:38 PM, "李家宏" <jh...@gmail.com> wrote:
>>>
>>> >Hi, all
>>> >I'm running a topology on storm cluster of 0.9.0.1 with netty as
>>> transport
>>> >layer, this error occurs :
>>> >Netty client failed to create a selector due to* too many open files
>>> >exception*, the worker continuously halting with initialization error.
>>> >
>>> >I checked the ulimit -n(> 130000) which is much bigger than currently
>>> >opened fds (sudo lsof | grep java | wc -l) which is about 6000 at
>>>most.
>>> >
>>> >By the way,this topology works fine with storm cluster of 0.8.0.
>>> >
>>> >What's the problem?
>>> >
>>> >here is the stack trace:
>>> >-------------------------------------------------------------
>>> >2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
>>> >plugin:backtype.storm.messaging.netty.Context
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
>>> >server mk-worker
>>> >   org.jboss.netty.channel.ChannelException: Failed to create a
>>>selector.
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abs
>>>>tra
>>> >ctNioSelector.java:337)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSele
>>>>cto
>>> >r.java:95)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker
>>>>.ja
>>> >va:51)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorker
>>>>Poo
>>> >l.java:45)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorker
>>>>Poo
>>> >l.java:28)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(Abst
>>>>rac
>>> >tNioWorkerPool.java:99)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractN
>>>>ioW
>>> >orkerPool.java:69)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39
>>>>)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33
>>>>)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioCl
>>>>ien
>>> >tSocketChannelFactory.java:152)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioCl
>>>>ien
>>> >tSocketChannelFactory.java:134)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at backtype.storm.messaging.netty.Client.(Client.java:54)
>>> >~[storm-netty-0.9.0.1.jar:na]
>>> >   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
>>> >~[storm-netty-0.9.0.1.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__58
>>>>34_
>>> >_5838$fn__5839.invoke(worker.clj:250)
>>> >   ~[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.LazySeq.sval(LazySeq.java:42)
>>>~[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.LazySeq.seq(LazySeq.java:60)
>>>~[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(w
>>>>ork
>>> >er.clj:244)
>>> >~[storm-core-0.9.0.1.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invok
>>>>e(w
>>> >orker.clj:357)
>>> >   ~[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.AFn.applyToHelper(AFn.java:185)
>>>[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>>> >   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.c
>>>>lj:
>>> >329)
>>> >[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.RestFn.invoke(RestFn.java:512)
>>>[clojure-1.4.0.jar:na]
>>> >   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
>>> >[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.AFn.applyToHelper(AFn.java:172)
>>>[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>>> >   at backtype.storm.daemon.worker.main(Unknown Source)
>>> >[storm-core-0.9.0.1.jar:na]
>>> >
>>> >  * Caused by: java.io.IOException: Too many open files*
>>> >
>>> >   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
>>> >   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>>> >~[na:1.6.0_38]
>>> >   at
>>>
>>> 
>>>>sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.jav
>>>>a:1
>>> >8)
>>> >~[na:1.6.0_38]
>>> >   at java.nio.channels.Selector.open(Selector.java:209)
>>>~[na:1.6.0_38]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abs
>>>>tra
>>> >ctNioSelector.java:335)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   ... 32 common frames omitted
>>> >   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
>>> >initialization")
>>>
>>> 
>>>>-----------------------------------------------------------------------
>>>>---
>>> >------------------------------------------
>>> >
>>> >Thanks
>>> >
>>> >--
>>> >
>>> >======================================================
>>> >
>>> >Gvain
>>> >
>>> >Email: jh.li.em@gmail.com
>>>
>>>
>>
>>
>> --
>>
>> ======================================================
>>
>> Gvain
>>
>> Email: jh.li.em@gmail.com
>>
>
>
>
>-- 
>
>======================================================
>
>Gvain
>
>Email: jh.li.em@gmail.com


Re: Repost: [storm-netty]-"too many open files exceptions"

Posted by 李家宏 <jh...@gmail.com>.
hi , evans

I tried out the latest version of storm, it uses a shared threadpool which
is non-blocking for every netty-client and thus reduced large number of
threads, as well as pipes. And for now, the "too many open file exceptions"
is never thrown.

One more thing:
 To my knowledge, as worker number increases, the number of tcp port used
per worker increases largely, and the max tcp port usage per worker is
twice the number of workers. What's more, one machine will host several
workers, the total tcp port usage per machine would be multiplied, and thus
will exhaust tcp ports(less than 65536) of the machine.

Thanks for your advice.


2014-04-16 10:36 GMT+08:00 李家宏 <jh...@gmail.com>:

> ​Although you reduced the Selector instances, netty still leaks open file
> descriptors. As topology expands much larger, the "too many open files
> exception" will inevitably throw.
>
>
> 2014-04-16 0:17 GMT+08:00 Bobby Evans <ev...@yahoo-inc.com>:
>
> I am rather stumped here. The code is blowing up creating a pipe as part
>> of an nio EpollSelector for netty to use.  My best advice right now is to
>> try and upgrade to the latest version of storm.  We have merged in two
>> fixes, one that relates to closing config files, and one that relates to
>> netty.  The fix makes it so that it uses less threads, but as a part of
>> that I believe that the number of Selector instances will be smaller too,
>> although this stake trace is for the client side, not the server side.
>>
>> ―Bobby
>>
>> On 4/14/14, 10:38 PM, "李家宏" <jh...@gmail.com> wrote:
>>
>> >Hi, all
>> >I'm running a topology on storm cluster of 0.9.0.1 with netty as
>> transport
>> >layer, this error occurs :
>> >Netty client failed to create a selector due to* too many open files
>> >exception*, the worker continuously halting with initialization error.
>> >
>> >I checked the ulimit -n(> 130000) which is much bigger than currently
>> >opened fds (sudo lsof | grep java | wc -l) which is about 6000 at most.
>> >
>> >By the way,this topology works fine with storm cluster of 0.8.0.
>> >
>> >What's the problem?
>> >
>> >here is the stack trace:
>> >-------------------------------------------------------------
>> >2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
>> >plugin:backtype.storm.messaging.netty.Context
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
>> >server mk-worker
>> >   org.jboss.netty.channel.ChannelException: Failed to create a selector.
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
>> >ctNioSelector.java:337)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelecto
>> >r.java:95)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.ja
>> >va:51)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
>> >l.java:45)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
>> >l.java:28)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(Abstrac
>> >tNioWorkerPool.java:99)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioW
>> >orkerPool.java:69)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
>> >tSocketChannelFactory.java:152)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
>> >tSocketChannelFactory.java:134)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at backtype.storm.messaging.netty.Client.(Client.java:54)
>> >~[storm-netty-0.9.0.1.jar:na]
>> >   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
>> >~[storm-netty-0.9.0.1.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__5834_
>> >_5838$fn__5839.invoke(worker.clj:250)
>> >   ~[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na]
>> >   at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na]
>> >   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
>> >   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
>> >   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
>> >   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
>> >   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(work
>> >er.clj:244)
>> >~[storm-core-0.9.0.1.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invoke(w
>> >orker.clj:357)
>> >   ~[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.4.0.jar:na]
>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>> >   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.clj:
>> >329)
>> >[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.4.0.jar:na]
>> >   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
>> >[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.4.0.jar:na]
>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>> >   at backtype.storm.daemon.worker.main(Unknown Source)
>> >[storm-core-0.9.0.1.jar:na]
>> >
>> >  * Caused by: java.io.IOException: Too many open files*
>> >
>> >   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
>> >   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>> >~[na:1.6.0_38]
>> >   at
>>
>> >sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:1
>> >8)
>> >~[na:1.6.0_38]
>> >   at java.nio.channels.Selector.open(Selector.java:209) ~[na:1.6.0_38]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
>> >ctNioSelector.java:335)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   ... 32 common frames omitted
>> >   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
>> >initialization")
>>
>> >--------------------------------------------------------------------------
>> >------------------------------------------
>> >
>> >Thanks
>> >
>> >--
>> >
>> >======================================================
>> >
>> >Gvain
>> >
>> >Email: jh.li.em@gmail.com
>>
>>
>
>
> --
>
> ======================================================
>
> Gvain
>
> Email: jh.li.em@gmail.com
>



-- 

======================================================

Gvain

Email: jh.li.em@gmail.com

Re: Repost: [storm-netty]-"too many open files exceptions"

Posted by 李家宏 <jh...@gmail.com>.
​Although you reduced the Selector instances, netty still leaks open file
descriptors. As topology expands much larger, the "too many open files
exception" will inevitably throw.


2014-04-16 0:17 GMT+08:00 Bobby Evans <ev...@yahoo-inc.com>:

> I am rather stumped here. The code is blowing up creating a pipe as part
> of an nio EpollSelector for netty to use.  My best advice right now is to
> try and upgrade to the latest version of storm.  We have merged in two
> fixes, one that relates to closing config files, and one that relates to
> netty.  The fix makes it so that it uses less threads, but as a part of
> that I believe that the number of Selector instances will be smaller too,
> although this stake trace is for the client side, not the server side.
>
> ―Bobby
>
> On 4/14/14, 10:38 PM, "李家宏" <jh...@gmail.com> wrote:
>
> >Hi, all
> >I'm running a topology on storm cluster of 0.9.0.1 with netty as transport
> >layer, this error occurs :
> >Netty client failed to create a selector due to* too many open files
> >exception*, the worker continuously halting with initialization error.
> >
> >I checked the ulimit -n(> 130000) which is much bigger than currently
> >opened fds (sudo lsof | grep java | wc -l) which is about 6000 at most.
> >
> >By the way,this topology works fine with storm cluster of 0.8.0.
> >
> >What's the problem?
> >
> >here is the stack trace:
> >-------------------------------------------------------------
> >2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
> >plugin:backtype.storm.messaging.netty.Context
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
> >   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
> >server mk-worker
> >   org.jboss.netty.channel.ChannelException: Failed to create a selector.
> >   at
> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
> >ctNioSelector.java:337)
> >   ~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelecto
> >r.java:95)
> >~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.ja
> >va:51)
> >~[netty-3.6.3.Final.jar:na]
> >   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
> >~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
> >l.java:45)
> >~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
> >l.java:28)
> >~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(Abstrac
> >tNioWorkerPool.java:99)
> >   ~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioW
> >orkerPool.java:69)
> >   ~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)
> >~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33)
> >~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
> >tSocketChannelFactory.java:152)
> >   ~[netty-3.6.3.Final.jar:na]
> >   at
> >org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
> >tSocketChannelFactory.java:134)
> >   ~[netty-3.6.3.Final.jar:na]
> >   at backtype.storm.messaging.netty.Client.(Client.java:54)
> >~[storm-netty-0.9.0.1.jar:na]
> >   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
> >~[storm-netty-0.9.0.1.jar:na]
> >   at
> >backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__5834_
> >_5838$fn__5839.invoke(worker.clj:250)
> >   ~[storm-core-0.9.0.1.jar:na]
> >   at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na]
> >   at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na]
> >   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
> >   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
> >   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
> >   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
> >   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
> >   at
> >backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(work
> >er.clj:244)
> >~[storm-core-0.9.0.1.jar:na]
> >   at
> >backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invoke(w
> >orker.clj:357)
> >   ~[storm-core-0.9.0.1.jar:na]
> >   at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.4.0.jar:na]
> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
> >   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
> >   at
> >backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.clj:
> >329)
> >[storm-core-0.9.0.1.jar:na]
> >   at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.4.0.jar:na]
> >   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
> >[storm-core-0.9.0.1.jar:na]
> >   at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.4.0.jar:na]
> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
> >   at backtype.storm.daemon.worker.main(Unknown Source)
> >[storm-core-0.9.0.1.jar:na]
> >
> >  * Caused by: java.io.IOException: Too many open files*
> >
> >   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
> >   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
> >~[na:1.6.0_38]
> >   at
> >sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:1
> >8)
> >~[na:1.6.0_38]
> >   at java.nio.channels.Selector.open(Selector.java:209) ~[na:1.6.0_38]
> >   at
> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
> >ctNioSelector.java:335)
> >   ~[netty-3.6.3.Final.jar:na]
> >   ... 32 common frames omitted
> >   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
> >initialization")
> >--------------------------------------------------------------------------
> >------------------------------------------
> >
> >Thanks
> >
> >--
> >
> >======================================================
> >
> >Gvain
> >
> >Email: jh.li.em@gmail.com
>
>


-- 

======================================================

Gvain

Email: jh.li.em@gmail.com

Re: Repost: [storm-netty]-"too many open files exceptions"

Posted by Bobby Evans <ev...@yahoo-inc.com>.
I am rather stumped here. The code is blowing up creating a pipe as part
of an nio EpollSelector for netty to use.  My best advice right now is to
try and upgrade to the latest version of storm.  We have merged in two
fixes, one that relates to closing config files, and one that relates to
netty.  The fix makes it so that it uses less threads, but as a part of
that I believe that the number of Selector instances will be smaller too,
although this stake trace is for the client side, not the server side.

―Bobby

On 4/14/14, 10:38 PM, "李家宏" <jh...@gmail.com> wrote:

>Hi, all
>I'm running a topology on storm cluster of 0.9.0.1 with netty as transport
>layer, this error occurs :
>Netty client failed to create a selector due to* too many open files
>exception*, the worker continuously halting with initialization error.
>
>I checked the ulimit -n(> 130000) which is much bigger than currently
>opened fds (sudo lsof | grep java | wc -l) which is about 6000 at most.
>
>By the way,this topology works fine with storm cluster of 0.8.0.
>
>What's the problem?
>
>here is the stack trace:
>-------------------------------------------------------------
>2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
>plugin:backtype.storm.messaging.netty.Context
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
>server mk-worker
>   org.jboss.netty.channel.ChannelException: Failed to create a selector.
>   at
>org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
>ctNioSelector.java:337)
>   ~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelecto
>r.java:95)
>~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.ja
>va:51)
>~[netty-3.6.3.Final.jar:na]
>   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
>~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
>l.java:45)
>~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
>l.java:28)
>~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(Abstrac
>tNioWorkerPool.java:99)
>   ~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioW
>orkerPool.java:69)
>   ~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)
>~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33)
>~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
>tSocketChannelFactory.java:152)
>   ~[netty-3.6.3.Final.jar:na]
>   at
>org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
>tSocketChannelFactory.java:134)
>   ~[netty-3.6.3.Final.jar:na]
>   at backtype.storm.messaging.netty.Client.(Client.java:54)
>~[storm-netty-0.9.0.1.jar:na]
>   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
>~[storm-netty-0.9.0.1.jar:na]
>   at
>backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__5834_
>_5838$fn__5839.invoke(worker.clj:250)
>   ~[storm-core-0.9.0.1.jar:na]
>   at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na]
>   at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na]
>   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
>   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
>   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
>   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
>   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
>   at
>backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(work
>er.clj:244)
>~[storm-core-0.9.0.1.jar:na]
>   at
>backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invoke(w
>orker.clj:357)
>   ~[storm-core-0.9.0.1.jar:na]
>   at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.4.0.jar:na]
>   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
>   at
>backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.clj:
>329)
>[storm-core-0.9.0.1.jar:na]
>   at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.4.0.jar:na]
>   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
>[storm-core-0.9.0.1.jar:na]
>   at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.4.0.jar:na]
>   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>   at backtype.storm.daemon.worker.main(Unknown Source)
>[storm-core-0.9.0.1.jar:na]
>
>  * Caused by: java.io.IOException: Too many open files*
>
>   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
>   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>~[na:1.6.0_38]
>   at
>sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:1
>8)
>~[na:1.6.0_38]
>   at java.nio.channels.Selector.open(Selector.java:209) ~[na:1.6.0_38]
>   at
>org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
>ctNioSelector.java:335)
>   ~[netty-3.6.3.Final.jar:na]
>   ... 32 common frames omitted
>   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
>initialization")
>--------------------------------------------------------------------------
>------------------------------------------
>
>Thanks
>
>-- 
>
>======================================================
>
>Gvain
>
>Email: jh.li.em@gmail.com