You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by ujfjhz <uj...@gmail.com> on 2016/05/09 02:34:56 UTC

Defunct workers still hold ports

hi,

There're some defunct workers in my storm cluster(version:0.9.5):
deploy    1634     1  0  2015 ?        07:11:45 [java] <defunct>
deploy    5607     1  2 Mar25 ?        23:59:26 [java] <defunct>
deploy    9154     1  2 Jan13 ?        3-05:31:28 [java] <defunct>
deploy   14292     1  4 Mar11 ?        2-20:59:31 [java] <defunct>

And these dead java process still hold the worker ports, let's take the
5607 process as the example:
$ lsof -i TCP:6704
COMMAND   PID   USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
java     5607 deploy   71u  IPv4  659563503      0t0  TCP *:6704 (LISTEN)

A thread of the defunct process is still alive:
$ ps -efL | grep 5607
deploy    1630 20886  1630  0    1 10:26 pts/1    00:00:00 grep 5607
deploy    5607     1  5607  0    2 Mar25 ?        00:00:00 [java] <defunct>
deploy    5607     1  5974  0    2 Mar25 ?        01:37:32 [java]

So when new assignment is coming, new worker creating will fail:

2016-05-06T11:27:04.143+0800 b.s.d.worker [INFO] Reading Assignments.
2016-05-06T11:27:04.202+0800 b.s.m.TransportFactory [INFO] Storm peer
transport plugin:backtype.storm.messaging.netty.Context
2016-05-06T11:27:04.394+0800 b.s.d.worker [INFO] Launching receive-thread
for 3278773a-4bca-4a53-a845-3668dfe089ee:6704
2016-05-06T11:27:04.409+0800 b.s.m.n.Server [INFO] Create Netty Server
Netty-server-localhost-6704, buffer_size: 5242880, maxWorkers: 1
2016-05-06T11:27:04.449+0800 b.s.d.worker [ERROR] Error on initialization
of server mk-worker
org.apache.storm.netty.channel.ChannelException: Failed to bind to:
0.0.0.0/0.0.0.0:6704
    at
org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.messaging.netty.Server.<init>(Server.java:130)
~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.messaging.netty.Context.bind(Context.java:75)
~[storm-core-0.9.5.jar:0.9.5]
    at
backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68)
~[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:668) [clojure-1.5.1.jar:na]
    at
backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:378)
~[storm-core-0.9.5.jar:0.9.5]
    at
backtype.storm.daemon.worker$fn__6959$exec_fn__1103__auto____6960.invoke(worker.clj:413)
~[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
    at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
    at
backtype.storm.daemon.worker$fn__6959$mk_worker__7015.doInvoke(worker.clj:391)
[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:502)
[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.worker.main(Unknown Source)
[storm-core-0.9.5.jar:0.9.5]
java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method) ~[na:1.6.0_35]
    at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
~[na:1.6.0_35]
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
~[na:1.6.0_35]
    at
org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
~[storm-core-0.9.5.jar:0.9.5]
    at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
~[storm-core-0.9.5.jar:0.9.5]
    at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
~[storm-core-0.9.5.jar:0.9.5]
    at
org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
~[storm-core-0.9.5.jar:0.9.5]
    at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
~[storm-core-0.9.5.jar:0.9.5]
    at
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
~[storm-core-0.9.5.jar:0.9.5]
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
~[na:1.6.0_35]
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
~[na:1.6.0_35]
    at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_35]
2016-05-06T11:27:04.471+0800 b.s.util [ERROR] Halting process: ("Error on
initialization")


My question is :
1) Why these defunct workers still hold the port?
2) How to release the ports hold by defunct workers?

Thank you.