You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/11/03 20:56:27 UTC
[jira] [Commented] (STORM-1155) Supervisor recurring health checks

    [ https://issues.apache.org/jira/browse/STORM-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987988#comment-14987988 ] 

ASF GitHub Bot commented on STORM-1155:
---------------------------------------

GitHub user tgravescs opened a pull request:

    https://github.com/apache/storm/pull/849

    STORM-1155: Supervisor recurring health checks

    https://issues.apache.org/jira/browse/STORM-1155

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgravescs/storm STORM-1155

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #849
    
----
commit 5bd5cf958943007dfe1742f6d4adda8f2a0b75ee
Author: Thomas Graves <tg...@decadefade.corp.ne1.yahoo.com>
Date:   2015-11-02T22:12:34Z

    STORM-1155.  Supervisor recurring health checks

commit c90c57447e3c9d466eabf06912e63c8fa53928da
Author: Thomas Graves <tg...@decadefade.corp.ne1.yahoo.com>
Date:   2015-11-03T19:51:04Z

    Update documentation

commit f6268a08499072e0f091df19605ee64fb3e80ba3
Author: Thomas Graves <tg...@decadefade.corp.ne1.yahoo.com>
Date:   2015-11-03T19:54:42Z

    fix typo

----


> Supervisor recurring health checks
> ----------------------------------
>
>                 Key: STORM-1155
>                 URL: https://issues.apache.org/jira/browse/STORM-1155
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> Add the ability for the supervisor to call out to health check scripts to allow some validation of the health of the node the supervisor is running on.
> It could regularly run scripts in a directory provided by the cluster admin. If any scripts fail, it should kill the workers and stop itself.
> This could work very much like the Hadoop scripts and if ERROR is returned on stdout it means the node has some issue and we should shut down.
> If a non-zero exit code is returned it indicates that the scripts failed to execute properly so you don't want to mark the node as unhealthy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)