You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Damien DESMARETS (JIRA)" <ji...@apache.org> on 2015/05/29 17:23:17 UTC

[jira] [Created] (STORM-840) My supervisor crashes when I kill a topology

Damien DESMARETS created STORM-840:
--------------------------------------

             Summary: My supervisor crashes when I kill a topology
                 Key: STORM-840
                 URL: https://issues.apache.org/jira/browse/STORM-840
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 0.9.4
         Environment: I have a test cluster of 3 servers base on Debian.
Each server use a docker running storm inside.

2 servers are only supervisor.
1 server is nimbus+UI+supervisor.

I use Oracle JVM 8u45. 
            Reporter: Damien DESMARETS


Hello,
I run 3 topologies inside my cluster.
Sometimes, when I kill one of them (not one specific). One supervisor goes down and restart. After few restart, it become stable.
The topology process is in "Zombie state" in the process list.

In version 0.9.3, all the supervisors crashed and couldn't restart. To resolve this, I had to "rm -fr <storm-local-dir>/workers/"
So I migrate to 0.9.4 (I thought that was STORM-682).

Now it continues but no all the times, but occasionally.

I have these logs inside supervisor.log:
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id nlp-11-1432906756
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id nlp-11-1432906756
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: 1432911702. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: 1432911702. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_45]
        at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
        at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[commons-exec-1.1.jar:1.1]
        at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) ~[storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
        at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
        at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
        at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_45]
        ... 19 common frames omitted
2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_45]
        at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
        at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[commons-exec-1.1.jar:1.1]
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[commons-exec-1.1.jar:1.1]
        at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) ~[storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
        at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
        at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
        at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_45]
        ... 19 common frames omitted
2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing an event")
java.lang.RuntimeException: ("Error when processing an event")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing an event")
java.lang.RuntimeException: ("Error when processing an event")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor 90f0964b-c48c-4cbc-9d1c-57119c56e99c
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor 90f0964b-c48c-4cbc-9d1c-57119c56e99c
2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:host.name=storm-supervisor-01
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.version=1.8.0_45
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.vendor=Oracle Corporation
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.home=/usr/lib/jvm/jre-8-oracle-x64/jre
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.class.path=/usr/share/apache-storm-0.9.4/lib/zookeeper-3.4.6.jar:/usr/share/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/share/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/share/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/share/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/share/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/share/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/share/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/share/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/share/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/share/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/share/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/share/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/share/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/share/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/share/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/share/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/share/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/share/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/share/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/share/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/share/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/share/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/share/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/share/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/share/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/share/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/share/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/share/apache-storm-0.9.4/conf
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/tmp
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.name=Linux
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.arch=amd64
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.version=3.16.0-0.bpo.4-amd64
...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)