You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jorge Moraleda (JIRA)" <ji...@apache.org> on 2018/04/29 03:40:00 UTC

[jira] [Commented] (STORM-3039) Ports of killed topologies remain in TIME_WAIT state preventing to start new topology

    [ https://issues.apache.org/jira/browse/STORM-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457901#comment-16457901 ] 

Jorge Moraleda commented on STORM-3039:
---------------------------------------

I have observed this problem in 1.2.1 and I have found a way a scenario that makes it relatively reproducible by including a bolt that throws consistently in its _prepare_ method so that the topology is continuously being restarted. 

In that situation, when the topology is killed manually (e.g. via the UI)) and resubmitted, there is a significant probability that the new topology will fail to start with the error shown in the original report.

The new topology will try to restart every two minutes. Eventually the new topology will bind to a new port (typically 6702) but the process can be repeated with more and more ports being full and the new topologies taking longer and longer to start.

> Ports of killed topologies remain in TIME_WAIT state preventing to start new topology
> -------------------------------------------------------------------------------------
>
>                 Key: STORM-3039
>                 URL: https://issues.apache.org/jira/browse/STORM-3039
>             Project: Apache Storm
>          Issue Type: Improvement
>    Affects Versions: 1.1.2, 1.2.1
>            Reporter: Gergely Hajós
>            Assignee: Gergely Hajós
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When topology is killed the slot ports (supervisor.slots.ports) remain in TIME_WAIT state. In that case new topology can not be started, because workers throw the following error:
> {code:java}
> 2018-04-20 08:37:08.742 o.a.s.d.worker main [ERROR] Error on initialization of server mk-worker
> org.apache.storm.shade.org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:6700
>  at org.apache.storm.shade.org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.messaging.netty.Server.<init>(Server.java:101) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.messaging.netty.Context.bind(Context.java:67) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.daemon.worker$worker_data$fn__10395.invoke(worker.clj:285) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.util$assoc_apply_self.invoke(util.clj:931) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.daemon.worker$worker_data.invoke(worker.clj:282) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.daemon.worker$fn__10693$exec_fn__3301__auto__$reify__10695.run(worker.clj:626) ~[storm-core-1.2.1.jar:1.2.1]
>  at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]
>  at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_161]
>  at org.apache.storm.daemon.worker$fn__10693$exec_fn__3301__auto____10694.invoke(worker.clj:624) ~[storm-core-1.2.1.jar:1.2.1]
>  at clojure.lang.AFn.applyToHelper(AFn.java:178) ~[clojure-1.7.0.jar:?]
>  at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.7.0.jar:?]
>  at clojure.core$apply.invoke(core.clj:630) ~[clojure-1.7.0.jar:?]
>  at org.apache.storm.daemon.worker$fn__10693$mk_worker__10784.doInvoke(worker.clj:598) [storm-core-1.2.1.jar:1.2.1]
>  at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.7.0.jar:?]
>  at org.apache.storm.daemon.worker$_main.invoke(worker.clj:787) [storm-core-1.2.1.jar:1.2.1]
>  at clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.7.0.jar:?]
>  at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.7.0.jar:?]
>  at org.apache.storm.daemon.worker.main(Unknown Source) [storm-core-1.2.1.jar:1.2.1]
> Caused by: java.net.BindException: Address already in use
>  at sun.nio.ch.Net.bind0(Native Method) ~[?:1.8.0_161]
>  at sun.nio.ch.Net.bind(Net.java:433) ~[?:1.8.0_161]
>  at sun.nio.ch.Net.bind(Net.java:425) ~[?:1.8.0_161]
>  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) ~[?:1.8.0_161]
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) ~[?:1.8.0_161]
>  at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[storm-core-1.2.1.jar:1.2.1]
>  at org.apache.storm.shade.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[storm-core-1.2.1.jar:1.2.1]
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]
> {code}
>  
> This exception occurs often when topologies stopped and started automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)