You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "David Judd (JIRA)" <ji...@apache.org> on 2017/03/29 21:48:41 UTC

[jira] [Created] (STORM-2439) HealthCheck feature does not work

David Judd created STORM-2439:
---------------------------------

             Summary: HealthCheck feature does not work
                 Key: STORM-2439
                 URL: https://issues.apache.org/jira/browse/STORM-2439
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.3
            Reporter: David Judd


There are a few issues with this feature:

1. The default timeout value produces `java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at org.apache.storm.command.HealthCheck.processScript(HealthCheck.java:79)` because the value, 5000, is automatically deserialized by Jackson as an Integer, but we attempt to cast it to a long. (I successfully worked around this by setting a timeout greater than the maximum int.)

2. The documentation says that a script should print "ERROR" if the node is unhealthy, but in fact the script must *also* exit with a non-zero exit code. This appears to be the opposite of what is intended, given a comment that says "We treat non-zero exit codes as indicators that the scripts failed to execute properly, not that the system is unhealthy". I believe the test in this line is inverted: https://github.com/apache/storm/blob/70102643e74d577728adf5f8719920d1bf60e98a/storm-core/src/jvm/org/apache/storm/command/HealthCheck.java#L97

3. Even with workarounds for the above two bugs, a failing health check does not cause workers to shut down in my testing with Storm 1.0.3. I have not determined the cause, but because the previous two issues suggest to me that this code is rarely if ever tested, I do not plan to investigate further at the moment.

If this feature is, as it appears, untested and non-functional, I would suggest that it be removed from the code and documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)