You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (Jira)" <ji...@apache.org> on 2019/09/25 08:37:00 UTC
[jira] [Comment Edited] (YARN-6715) Fix documentation about
NodeHealthScriptRunner
[ https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937519#comment-16937519 ]
Szilard Nemeth edited comment on YARN-6715 at 9/25/19 8:36 AM:
---------------------------------------------------------------
Hi [~pbacsko]!
+1 on the latest patch!
Committed this to trunk!
Do you think it's worthwile to backport to branch-3.2 and branch-3.1 as well?
Thanks!
was (Author: snemeth):
Hi [~pbacsko]!
+1 on the latest patch!
Committing this to trunk!
Do you think it's worthwile to backport to branch-3.2 and branch-3.1 as well?
Thanks!
> Fix documentation about NodeHealthScriptRunner
> -----------------------------------------------
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
> Issue Type: Bug
> Components: documentation, nodemanager
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, YARN-6715-003.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
> long now = System.currentTimeMillis();
> switch (status) {
> case SUCCESS:
> setHealthStatus(true, "", now);
> break;
> case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
> case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
> case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
> }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with the upstream document, which says:
> "If the script *exits with a non-zero exit code*, times out or results in an exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might also add an extra comment to {{reportHealthStatus()}} which explains that {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org