You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Naganarasimha G R (JIRA)" <ji...@apache.org> on 2017/05/03 14:45:04 UTC

[jira] [Commented] (YARN-6523) RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster

    [ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994995#comment-15994995 ] 

Naganarasimha G R commented on YARN-6523:
-----------------------------------------

Sorry for the delay in response [~jlowe],
Thanks for the very detailed response. Agree that the delta approaches initially mentioned can introduce certain amount of complexity in the cases mentioned by you.
Though initially the approach mentioned by you was appealing and less complicated, i was thinking of following scenarios :
# When there are large number of small jobs in a large clsuter we almost send the tokens as the sequence keeps increasing when more and more jobs get submitted.
# Well we are doing interface modification, so it would be better to go for complete solution so that its not revisited again for deprecation.

One other approach which i can think of is : Send all the tokens during node registration ( This will avoid most of the corner cases) and as part of heartbeat send the app tokens(all) which have been renewed (which can be done in event based model). Further we can have the cache(pre-computed) of SystemCredentialsForAppsProto which are sent as part of Heart Beat so that we reduce memory foot print. thus this approach would solve large number of small jobs too without interface change. thoughts ?

> RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens though all applications might not be active on the node. On top of it NodeHeartbeatResponsePBImpl converts tokens for each app into SystemCredentialsForAppsProto. Hence for each node and each heartbeat too many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org