You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/01/05 21:52:39 UTC
[jira] [Commented] (STORM-695) storm CLI tool reports zero exit
code on error scenario, take 2
[ https://issues.apache.org/jira/browse/STORM-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083760#comment-15083760 ]
ASF GitHub Bot commented on STORM-695:
--------------------------------------
Github user revans2 commented on the pull request:
https://github.com/apache/storm/pull/456#issuecomment-169130180
@Lewuathe I am sorry that this fell off of my radar. The code looks great. I am +1
> storm CLI tool reports zero exit code on error scenario, take 2
> ---------------------------------------------------------------
>
> Key: STORM-695
> URL: https://issues.apache.org/jira/browse/STORM-695
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.9.3
> Reporter: Michael Noll
> Assignee: Kai Sasaki
> Priority: Minor
>
> Commands such as "storm kill non-existing-topology" will return an exit code of zero, indicating success when in fact the command failed.
> h3. How to reproduce
> Here is but one example where the {{storm}} CLI tool violates shell best practices:
> {code}
> # Let's kill a topology that is in fact not running in the cluster.
> $ storm kill i-do-not-exist-topo
> <snip>
> Exception in thread "main" NotAliveException(msg:i-do-not-exist-topo is not alive)
> # Print the exit code of last command.
> $ echo $?
> 0 # <<< but since the kill command failed this should be non-zero!
> {code}
> Another example is the "storm jar" command. If you attempt to submit a topology that has the same name as an existing, running topology, the "storm jar" command will not submit the topology -- instead it will print an exception (think: "the topology FooName is already running"), which is ok, but it will then exit with a return code of zero, which indicates success (which is wrong).
> h3. Impact
> This bug prevents automated deployment tools such as Ansible or Puppet as well as ad-hoc CLI scripting (think: fire-fighting Ops teams) to work properly because Storm violates shell conventions by not returning non-zero exit codes in case of failures.
> h3. How to fix
> From what I understand the solution is two-fold:
> # The various Storm commands that are being called by the {{storm}} script must return proper exit codes.
> # The {{storm}} script must store these exit codes and return itself with the respective exit code of the Storm command it actually ran.
> For example, here's the current code that implements the "storm kill" command:
> {code}
> # In: bin/storm
> def kill(*args):
> """Syntax: [storm kill topology-name [-w wait-time-secs]]
> Kills the topology with the name topology-name. Storm will
> first deactivate the topology's spouts for the duration of
> the topology's message timeout to allow all messages currently
> being processed to finish processing. Storm will then shutdown
> the workers and clean up their state. You can override the length
> of time Storm waits between deactivation and shutdown with the -w flag.
> """
> exec_storm_class(
> "backtype.storm.command.kill_topology",
> args=args,
> jvmtype="-client",
> extrajars=[USER_CONF_DIR, STORM_BIN_DIR])
> {code}
> which in turn calls the following code in {{kill_topology.clj}}:
> {code}
> ;; In: backtype.storm.command.kill-topology
> (ns backtype.storm.command.kill-topology
> (:use [clojure.tools.cli :only [cli]])
> (:use [backtype.storm thrift config log])
> (:import [backtype.storm.generated KillOptions])
> (:gen-class))
> (defn -main [& args]
> (let [[{wait :wait} [name] _] (cli args ["-w" "--wait" :default nil :parse-fn #(Integer/parseInt %)])
> opts (KillOptions.)]
> (if wait (.set_wait_secs opts wait))
> (with-configured-nimbus-connection nimbus
> (.killTopologyWithOpts nimbus name opts)
> (log-message "Killed topology: " name)
> )))
> {code}
> which in turn calls the following code in {{nimbus.clj}}:
> {code}
> ;; In: backtype.storm.daemon.nimbus
> (^void killTopologyWithOpts [this ^String storm-name ^KillOptions options]
> (check-storm-active! nimbus storm-name true)
> (let [topology-conf (try-read-storm-conf-from-name conf storm-name nimbus)]
> (check-authorization! nimbus storm-name topology-conf "killTopology"))
> (let [wait-amt (if (.is_set_wait_secs options)
> (.get_wait_secs options)
> )]
> (transition-name! nimbus storm-name [:kill wait-amt] true)
> ))
> {code}
> As you can see the current implementation does not pass success/failure information back to the caller.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)