You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wilfred Spiegelenburg (JIRA)" <ji...@apache.org> on 2016/08/30 03:12:20 UTC

[jira] [Reopened] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

     [ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wilfred Spiegelenburg reopened YARN-5567:
-----------------------------------------

The script handling had a lot of comments in it why the exit code was ignored and an exit code that is not zero should not change the health status:
{code}
144     * The node is marked unhealthy if
145     * <ol>
146     * <li>The node health script times out</li>
147     * <li>The node health scripts output has a line which begins with ERROR</li>
148     * <li>An exception is thrown while executing the script</li>
149     * </ol>
150     * If the script throws {@link IOException} or {@link ExitCodeException} the
151     * output is ignored and node is left remaining healthy, as script might
152     * have syntax error.
{code}

What we have just done is break all of this. We now do not ignore the exit code and mark the node as unhealthy. I assume this was originally done for a reason and we could have just introduced a backwards incompatible behavioural change.

Looking at the underlying ShellCommandExecutor and tracing back to the {{Shell.runCommnad()}} method: all non zero exit codes will throw a {{ExitCodeException}}.

If we are going to change the behaviour that is documented we should not do it in release 2.8.1 and also update all related documentation.

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --------------------------------------------------------------------------
>
>                 Key: YARN-5567
>                 URL: https://issues.apache.org/jira/browse/YARN-5567
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>             Fix For: 2.8.1
>
>         Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>       case FAILED_WITH_EXIT_CODE:
>         setHealthStatus(true, "", now);
>         break;
> {code}
> should be 
> {code}
>       case FAILED_WITH_EXIT_CODE:
>         setHealthStatus(false, "", now);
>         break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org