You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jim Brennan (Jira)" <ji...@apache.org> on 2020/07/08 16:52:00 UTC

[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

    [ https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153754#comment-17153754 ] 

Jim Brennan commented on YARN-10348:
------------------------------------

patch 001 adds a new YARN configuration property:
{noformat}
public static final String RM_DELEGATION_TOKEN_ALWAYS_CANCEL =
    RM_PREFIX + "delegation-token.always-cancel";
public static final boolean DEFAULT_RM_DELEGATION_TOKEN_ALWAYS_CANCEL = false; 
{noformat}
Internally we default this to true, but to maintain compatibility I've set it to false in this patch.

If this property is true, we effectively ignore the {{shouldCancelAtEnd}} parameter that came from the client.
We have been running with this change in production internally for about two years.

> Allow RM to always cancel tokens after app completes
> ----------------------------------------------------
>
>                 Key: YARN-10348
>                 URL: https://issues.apache.org/jira/browse/YARN-10348
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.10.0, 3.1.3
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>         Attachments: YARN-10348.001.patch
>
>
> (Note: this change was originally done on our internal branch by [~daryn]).
> The RM currently has an option for a client to specify disabling token cancellation when a job completes. This feature was an initial attempt to address the use case of a job launching sub-jobs (ie. oozie launcher) and the original job finishing prior to the sub-job(s) completion - ex. original job completion triggered premature cancellation of tokens needed by the sub-jobs.
> Many years ago, [~daryn] added a more robust implementation to ref count tokens ([YARN-3055]). This prevented premature cancellation of the token until all apps using the token complete, and invalidated the need for a client to specify cancel=false. Unfortunately the config option was not removed.
> We have seen cases where oozie "java actions" and some users were explicitly disabling token cancellation. This can lead to a buildup of defunct tokens that may overwhelm the ZK buffer used by the KDC's backing store. At which point the KMS fails to connect to ZK and is unable to issue/validate new tokens - rendering the KDC only able to authenticate pre-existing tokens. Production incidents have occurred due to the buffer size issue.
> To avoid these issues, the RM should have the option to ignore/override the client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org