You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "zhuqi (JIRA)" <ji...@apache.org> on 2019/06/22 03:14:00 UTC

[jira] [Updated] (YARN-9634) Make yarn submit dir and log aggregation dir more evenly distributed

     [ https://issues.apache.org/jira/browse/YARN-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhuqi updated YARN-9634:
------------------------
    Description: When the cluster size is large, the dir which user submits the job, and the dir which container log aggregate, and other information will fill the HDFS directory, because the HDFS directory has a default storage limit, this can be configured by "yarn.log-aggregation.retain-seconds" to solve. But  the FSNamesystemLock#writeLock and rpc operation which these dir operation triggered will affect the namespace which these dirs are located, in order to get this better we have let this dir in one single HDFS federation namespace, but with the cluster become huge, the single namespace will also affect the rpc performance. In response to this situation, we can change these dirs more distributed among multi namespace dirs, with some policy to choose, such as hash policy and round robin policy.  (was: When the cluster size is large, the dir which user submits the job, and the dir which container log aggregate, and other information will fill the HDFS directory, because the HDFS directory has a default storage limit. In response to this situation, we can change these dirs more distributed, with some policy to choose, such as hash policy and round robin policy.)

> Make yarn submit dir and log aggregation dir more evenly distributed
> --------------------------------------------------------------------
>
>                 Key: YARN-9634
>                 URL: https://issues.apache.org/jira/browse/YARN-9634
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 3.2.0
>            Reporter: zhuqi
>            Assignee: zhuqi
>            Priority: Major
>
> When the cluster size is large, the dir which user submits the job, and the dir which container log aggregate, and other information will fill the HDFS directory, because the HDFS directory has a default storage limit, this can be configured by "yarn.log-aggregation.retain-seconds" to solve. But  the FSNamesystemLock#writeLock and rpc operation which these dir operation triggered will affect the namespace which these dirs are located, in order to get this better we have let this dir in one single HDFS federation namespace, but with the cluster become huge, the single namespace will also affect the rpc performance. In response to this situation, we can change these dirs more distributed among multi namespace dirs, with some policy to choose, such as hash policy and round robin policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org