You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tao Yang (JIRA)" <ji...@apache.org> on 2019/07/19 10:00:01 UTC

[jira] [Updated] (YARN-9686) Reduce visibility of blacklisted nodes information (only for current app attempt) to avoid the abuse of memory

     [ https://issues.apache.org/jira/browse/YARN-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tao Yang updated YARN-9686:
---------------------------
    Attachment: YARN-9686.001.patch

> Reduce visibility of blacklisted nodes information (only for current app attempt) to avoid the abuse of memory
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-9686
>                 URL: https://issues.apache.org/jira/browse/YARN-9686
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9686.001.patch
>
>
> Recently we found an issue that RM did a long GC and found many WARN logs(Ignoring Blacklists, blacklist size 1775 is more than failure threshold ratio 0.20000000298023224 out of total usable nodes 1778) in RM log with a super high frequency about 3w+/s.
> The direct cause is that a few apps with a large attempts and many blacklisted nodes were requested frequently via REST API or WEB UI. For every single request, RM should allocate new memory for blacklisted nodes for many times(N * NUM_ATTETMPTS).
> Currently both AM(system) blacklisted nodes and app blacklisted nodes are transferred among app attempts and there are only one instance for each other, it's redundant and costly to travel all blacklisted nodes for every app attempt, so that I propose to get and show blacklisted nodes only for current app attempt to enhance performance and avoid the abuse of memory in some similar scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org