You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/01/05 21:52:39 UTC

[jira] [Commented] (STORM-695) storm CLI tool reports zero exit code on error scenario, take 2

    [ https://issues.apache.org/jira/browse/STORM-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083760#comment-15083760 ] 

ASF GitHub Bot commented on STORM-695:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/456#issuecomment-169130180
  
    @Lewuathe I am sorry that this fell off of my radar.  The code looks great. I am +1


> storm CLI tool reports zero exit code on error scenario, take 2
> ---------------------------------------------------------------
>
>                 Key: STORM-695
>                 URL: https://issues.apache.org/jira/browse/STORM-695
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.9.3
>            Reporter: Michael Noll
>            Assignee: Kai Sasaki
>            Priority: Minor
>
> Commands such as "storm kill non-existing-topology" will return an exit code of zero, indicating success when in fact the command failed.
> h3. How to reproduce
> Here is but one example where the {{storm}} CLI tool violates shell best practices:
> {code}
> # Let's kill a topology that is in fact not running in the cluster.
> $ storm kill i-do-not-exist-topo
> <snip>
> Exception in thread "main" NotAliveException(msg:i-do-not-exist-topo is not alive)
> # Print the exit code of last command.
> $ echo $?
> 0  # <<< but since the kill command failed this should be non-zero!
> {code}
> Another example is the "storm jar" command.  If you attempt to submit a topology that has the same name as an existing, running topology, the "storm jar" command will not submit the topology -- instead it will print an exception (think: "the topology FooName is already running"), which is ok, but it will then exit with a return code of zero, which indicates success (which is wrong).
> h3. Impact
> This bug prevents automated deployment tools such as Ansible or Puppet as well as ad-hoc CLI scripting (think: fire-fighting Ops teams) to work properly because Storm violates shell conventions by not returning non-zero exit codes in case of failures.
> h3. How to fix
> From what I understand the solution is two-fold:
> # The various Storm commands that are being called by the {{storm}} script must return proper exit codes.
> # The {{storm}} script must store these exit codes and return itself with the respective exit code of the Storm command it actually ran.
> For example, here's the current code that implements the "storm kill" command:
> {code}
> # In: bin/storm
> def kill(*args):
>     """Syntax: [storm kill topology-name [-w wait-time-secs]]
>     Kills the topology with the name topology-name. Storm will 
>     first deactivate the topology's spouts for the duration of 
>     the topology's message timeout to allow all messages currently 
>     being processed to finish processing. Storm will then shutdown 
>     the workers and clean up their state. You can override the length 
>     of time Storm waits between deactivation and shutdown with the -w flag.
>     """
>     exec_storm_class(
>         "backtype.storm.command.kill_topology", 
>         args=args, 
>         jvmtype="-client", 
>         extrajars=[USER_CONF_DIR, STORM_BIN_DIR])
> {code}
> which in turn calls the following code in {{kill_topology.clj}}:
> {code}
> ;; In: backtype.storm.command.kill-topology
> (ns backtype.storm.command.kill-topology
>   (:use [clojure.tools.cli :only [cli]])
>   (:use [backtype.storm thrift config log])
>   (:import [backtype.storm.generated KillOptions])
>   (:gen-class))
> (defn -main [& args]
>   (let [[{wait :wait} [name] _] (cli args ["-w" "--wait" :default nil :parse-fn #(Integer/parseInt %)])
>         opts (KillOptions.)]
>     (if wait (.set_wait_secs opts wait))
>     (with-configured-nimbus-connection nimbus
>       (.killTopologyWithOpts nimbus name opts)
>       (log-message "Killed topology: " name)
>       )))
> {code}
> which in turn calls the following code in {{nimbus.clj}}:
> {code}
> ;; In: backtype.storm.daemon.nimbus
>       (^void killTopologyWithOpts [this ^String storm-name ^KillOptions options]
>         (check-storm-active! nimbus storm-name true)
>         (let [topology-conf (try-read-storm-conf-from-name conf storm-name nimbus)]
>           (check-authorization! nimbus storm-name topology-conf "killTopology"))
>         (let [wait-amt (if (.is_set_wait_secs options)
>                          (.get_wait_secs options)                         
>                          )]
>           (transition-name! nimbus storm-name [:kill wait-amt] true)
>           ))
> {code}
> As you can see the current implementation does not pass success/failure information back to the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)