You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 03:09:27 UTC

[jira] [Reopened] (STORM-840) My supervisor crashes when I kill a topology

     [ https://issues.apache.org/jira/browse/STORM-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg reopened STORM-840:
--------------------------------

> My supervisor crashes when I kill a topology
> --------------------------------------------
>
>                 Key: STORM-840
>                 URL: https://issues.apache.org/jira/browse/STORM-840
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.9.4
>         Environment: I have a test cluster of 3 servers base on Debian.
> Each server use a docker running storm inside.
> 2 servers are only supervisor.
> 1 server is nimbus+UI+supervisor.
> I use Oracle JVM 8u45. 
>            Reporter: Damien DESMARETS
>              Labels: crash, stability
>
> Hello,
> I run 3 topologies inside my cluster.
> Sometimes, when I kill one of them (not one specific). One supervisor goes down and restart. After few restart, it become stable.
> The topology process is in "Zombie state" in the process list.
> In version 0.9.3, all the supervisors crashed and couldn't restart. To resolve this, I had to "rm -fr <storm-local-dir>/workers/"
> So I migrate to 0.9.4 (I thought that was STORM-682).
> Now it continues but no all the times, but occasionally.
> I have these logs inside supervisor.log:
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id nlp-11-1432906756
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id nlp-11-1432906756
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: 1432911702. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: 1432911702. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
> 2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
> java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_45]
>         at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
>         at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[commons-exec-1.1.jar:1.1]
>         at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) ~[storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>         at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>         at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>         at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
>         at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>         at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
>         at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_45]
>         ... 19 common frames omitted
> 2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
> java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_45]
>         at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
>         at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[commons-exec-1.1.jar:1.1]
>         at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[commons-exec-1.1.jar:1.1]
>         at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) ~[storm-core-0.9.4.jar:0.9.4]
>         at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) ~[storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>         at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>         at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>         at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
>         at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>         at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
>         at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_45]
>         ... 19 common frames omitted
> 2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing an event")
> java.lang.RuntimeException: ("Error when processing an event")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
>         at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> 2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing an event")
> java.lang.RuntimeException: ("Error when processing an event")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
>         at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor 90f0964b-c48c-4cbc-9d1c-57119c56e99c
> 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor 90f0964b-c48c-4cbc-9d1c-57119c56e99c
> 2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
> 2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:host.name=storm-supervisor-01
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.version=1.8.0_45
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.vendor=Oracle Corporation
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.home=/usr/lib/jvm/jre-8-oracle-x64/jre
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.class.path=/usr/share/apache-storm-0.9.4/lib/zookeeper-3.4.6.jar:/usr/share/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/share/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/share/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/share/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/share/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/share/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/share/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/share/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/share/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/share/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/share/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/share/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/share/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/share/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/share/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/share/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/share/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/share/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/share/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/share/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/share/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/share/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/share/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/share/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/share/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/share/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/share/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/share/apache-storm-0.9.4/conf
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/tmp
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.name=Linux
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.arch=amd64
> 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.version=3.16.0-0.bpo.4-amd64
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)