You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by tgravescs <gi...@git.apache.org> on 2015/11/03 20:55:46 UTC

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

GitHub user tgravescs opened a pull request:

    https://github.com/apache/storm/pull/849

    STORM-1155: Supervisor recurring health checks

    https://issues.apache.org/jira/browse/STORM-1155

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgravescs/storm STORM-1155

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #849
    
----
commit 5bd5cf958943007dfe1742f6d4adda8f2a0b75ee
Author: Thomas Graves <tg...@decadefade.corp.ne1.yahoo.com>
Date:   2015-11-02T22:12:34Z

    STORM-1155.  Supervisor recurring health checks

commit c90c57447e3c9d466eabf06912e63c8fa53928da
Author: Thomas Graves <tg...@decadefade.corp.ne1.yahoo.com>
Date:   2015-11-03T19:51:04Z

    Update documentation

commit f6268a08499072e0f091df19605ee64fb3e80ba3
Author: Thomas Graves <tg...@decadefade.corp.ne1.yahoo.com>
Date:   2015-11-03T19:54:42Z

    fix typo

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by zhuoliu <gi...@git.apache.org>.
Github user zhuoliu commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154181737
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154167889
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by zhuoliu <gi...@git.apache.org>.
Github user zhuoliu commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-153483169
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by zhuoliu <gi...@git.apache.org>.
Github user zhuoliu commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154130951
  
    Hi @revans2 and @tgravescs ,
    
    For checking all cases of relative/absolute directory for healthcheck, I would suggest to add the following function in config.clj, then healthcheck.clj can all this function to get the directory.
    
    (defn absolute-healthcheck-dir [conf]
      (let [storm-home (System/getProperty "storm.home")
            path (conf STORM-HEALTH-CHECK-DIR)]
        (if path
          (if (is-absolute-path? path) path (str storm-home file-path-separator path))
          (str storm-home file-path-separator “healthcheck”))))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by longdafeng <gi...@git.apache.org>.
Github user longdafeng commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-155701669
  
    +1 
    sorry for late response


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/849


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by longdafeng <gi...@git.apache.org>.
Github user longdafeng commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-153917289
  
    the code is fine to me, but this is the first step to do health check:
    (1) where put the script, we had better put the script under $STORM_HOME/healthcheck, because when install storm, the script will be also installed in every node.
    (2) the script depends on OS, in Windows, they should be xxx, in Linux, they should be XXX;
    maybe we need some OS detect action to judge what kind of OS is running. so in the "STORM_HEALTH_CHECK_DIR" dir, there are several directory,  windows/linux/mac, different OS use different dir.
    (3) could you please help to implement healthcheck.clj with java. in a short while, the storm will contain two core, one is clojure, the other is java. so could you please help to implement healthcheck.clj. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-153546831
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154127266
  
    @longdafeng it might be nice in the future to have some "standard" health check scripts come with storm, but honestly even with just Linux, it is difficult to come up with something that will work on all Linux distros, or even on all boxes with the same distro.
    
    The things I see as potential issues would be the network card fell back to 100Mbit.  But that does not work if you are running on old 100Mbit hardware, or what if for some reason you are using infiniband, with a 100Mbit control path.  Or if you are using bonded NICs?  There are too many variables in my opinion to really make a lot of these generic.
    
    The default directory is `$STORM_HOME/healthcheck` already.  Directories are usually relative to $STORM_HOME, but there have been a few cases where we needed to make it explicit.  @zhuoliu you did a lot of work around resolving relative paths in configs.  could you take a look at this and see if we need to change anything to make this work so it is always relative to `$STORM_HOME`/?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154156435
  
    added the absolute path check and upmerged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154118076
  
    I'd way we can convert this to java later when we do the merge.  
    1) the directory is already configurable I don't follow what you are asking
    2) I don't have windows box on here to test but since its all configuration and user writes the scripts I think it will work on windows already. If not perhaps we can file follow on for folks who have access to test windows to fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by hustfxj <gi...@git.apache.org>.
Github user hustfxj commented on a diff in the pull request:

    https://github.com/apache/storm/pull/849#discussion_r45436295
  
    --- Diff: storm-core/src/clj/backtype/storm/command/healthcheck.clj ---
    @@ -0,0 +1,88 @@
    +;; Licensed to the Apache Software Foundation (ASF) under one
    +;; or more contributor license agreements.  See the NOTICE file
    +;; distributed with this work for additional information
    +;; regarding copyright ownership.  The ASF licenses this file
    +;; to you under the Apache License, Version 2.0 (the
    +;; "License"); you may not use this file except in compliance
    +;; with the License.  You may obtain a copy of the License at
    +;;
    +;; http://www.apache.org/licenses/LICENSE-2.0
    +;;
    +;; Unless required by applicable law or agreed to in writing, software
    +;; distributed under the License is distributed on an "AS IS" BASIS,
    +;; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +;; See the License for the specific language governing permissions and
    +;; limitations under the License.
    +(ns backtype.storm.command.healthcheck
    +  (:require [backtype.storm
    +             [config :refer :all]
    +             [log :refer :all]]
    +            [clojure.java [io :as io]]
    +            [clojure [string :refer [split]]])
    +  (:gen-class))
    +
    +(defn interrupter
    +  "Interrupt a given thread after ms milliseconds."
    +  [thread ms]
    +  (let [interrupter (Thread.
    +                     (fn []
    +                       (try
    +                         (Thread/sleep ms)
    +                         (.interrupt thread)
    +                         (catch InterruptedException e))))]
    +    (.start interrupter)
    +    interrupter))
    +
    +(defn check-output [lines]
    +  (if (some #(.startsWith % "ERROR") lines)
    +    :failed
    +    :success))
    +
    +(defn process-script [conf script]
    +  (let [script-proc (. (Runtime/getRuntime) (exec script))
    +        curthread (Thread/currentThread)
    +        interrupter-thread (interrupter curthread
    +                                        (conf STORM-HEALTH-CHECK-TIMEOUT-MS))]
    +    (try
    +      (.waitFor script-proc)
    +      (.interrupt interrupter-thread)
    --- End diff --
    
    @revans2  If script-proc is blocked,then throw InterruptedException and println "Script" script "timed out.".But the script-proc isn't really stop.Like that:
    admin    12755     1  0 12:49 pts/0    00:00:00 /bin/sh /home/admin/test/healthCheck.sh
    admin    12978     1  0 12:50 pts/0    00:00:00 /bin/sh /home/admin/test/healthCheck.sh
    admin    13228     1  0 12:51 pts/0    00:00:00 /bin/sh /home/admin/test/healthCheck.sh
    admin    13504     1  0 12:52 pts/0    00:00:00 /bin/sh /home/admin/test/healthCheck.sh
    admin    13644 13465  0 12:52 pts/0    00:00:00 /bin/sh /home/admin/test/healthCheck.sh
    
    Maybe we can stop the process ?
    
    (defn interrupter
    +  "Interrupt a given thread after ms milliseconds."
    +  [script-proc ms]
    +  (let [interrupter (Thread.
    +                     (fn []
    +                       (try
    +                         (Thread/sleep ms)
    +                         (.destory script-proc)
    +                         (catch InterruptedException e))))]
    +    (.start interrupter)
    +    interrupter))



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-153472081
  
    +1 looks good.  It might be nice to mention in the documentation that if you are running the supervisor under supervision that calling `storm health-check` will let you know if you should run the supervisor or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-153484376
  
    Added note about being under supervision.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154182207
  
    travis failed with disrupterQueueTest failures which is unrelated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1155: Supervisor recurring health checks

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/849#issuecomment-154437494
  
    @longdafeng do you have any more comments on this pull request?  I would like to merge this in soon if you are OK with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---