You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by ujfjhz <uj...@gmail.com> on 2016/05/09 02:34:56 UTC
Defunct workers still hold ports
hi,
There're some defunct workers in my storm cluster(version:0.9.5):
deploy 1634 1 0 2015 ? 07:11:45 [java] <defunct>
deploy 5607 1 2 Mar25 ? 23:59:26 [java] <defunct>
deploy 9154 1 2 Jan13 ? 3-05:31:28 [java] <defunct>
deploy 14292 1 4 Mar11 ? 2-20:59:31 [java] <defunct>
And these dead java process still hold the worker ports, let's take the
5607 process as the example:
$ lsof -i TCP:6704
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 5607 deploy 71u IPv4 659563503 0t0 TCP *:6704 (LISTEN)
A thread of the defunct process is still alive:
$ ps -efL | grep 5607
deploy 1630 20886 1630 0 1 10:26 pts/1 00:00:00 grep 5607
deploy 5607 1 5607 0 2 Mar25 ? 00:00:00 [java] <defunct>
deploy 5607 1 5974 0 2 Mar25 ? 01:37:32 [java]
So when new assignment is coming, new worker creating will fail:
2016-05-06T11:27:04.143+0800 b.s.d.worker [INFO] Reading Assignments.
2016-05-06T11:27:04.202+0800 b.s.m.TransportFactory [INFO] Storm peer
transport plugin:backtype.storm.messaging.netty.Context
2016-05-06T11:27:04.394+0800 b.s.d.worker [INFO] Launching receive-thread
for 3278773a-4bca-4a53-a845-3668dfe089ee:6704
2016-05-06T11:27:04.409+0800 b.s.m.n.Server [INFO] Create Netty Server
Netty-server-localhost-6704, buffer_size: 5242880, maxWorkers: 1
2016-05-06T11:27:04.449+0800 b.s.d.worker [ERROR] Error on initialization
of server mk-worker
org.apache.storm.netty.channel.ChannelException: Failed to bind to:
0.0.0.0/0.0.0.0:6704
at
org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.messaging.netty.Server.<init>(Server.java:130)
~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.messaging.netty.Context.bind(Context.java:75)
~[storm-core-0.9.5.jar:0.9.5]
at
backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68)
~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.RestFn.invoke(RestFn.java:668) [clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:378)
~[storm-core-0.9.5.jar:0.9.5]
at
backtype.storm.daemon.worker$fn__6959$exec_fn__1103__auto____6960.invoke(worker.clj:413)
~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$fn__6959$mk_worker__7015.doInvoke(worker.clj:391)
[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker$_main.invoke(worker.clj:502)
[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker.main(Unknown Source)
[storm-core-0.9.5.jar:0.9.5]
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method) ~[na:1.6.0_35]
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
~[na:1.6.0_35]
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
~[na:1.6.0_35]
at
org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
~[storm-core-0.9.5.jar:0.9.5]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
~[storm-core-0.9.5.jar:0.9.5]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
~[storm-core-0.9.5.jar:0.9.5]
at
org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
~[storm-core-0.9.5.jar:0.9.5]
at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
~[storm-core-0.9.5.jar:0.9.5]
at
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
~[storm-core-0.9.5.jar:0.9.5]
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
~[na:1.6.0_35]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
~[na:1.6.0_35]
at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_35]
2016-05-06T11:27:04.471+0800 b.s.util [ERROR] Halting process: ("Error on
initialization")
My question is :
1) Why these defunct workers still hold the port?
2) How to release the ports hold by defunct workers?
Thank you.