You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Chackaravarthy (JIRA)" <ji...@apache.org> on 2016/08/04 09:32:20 UTC

[jira] [Updated] (YARN-5445) Log aggregation configured to different namenode can fail fast

     [ https://issues.apache.org/jira/browse/YARN-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chackaravarthy updated YARN-5445:
---------------------------------
    Attachment: YARN-5445-1.patch

Attached a patch where LogAggregationService creates a new yarn config and overrides {{dfs.client.failover.max.attempts}} to create FileSystem instance.

There are two open points to discuss :

* "dfs.client.failover.max.attempts" is hardcoded in LogAggregationService because this config param is present in hadoop-hdfs (DFSConfigKeys.java) project. And hadoop-yarn-server-nodemanager (LogAggregationService) does not have dependency on hadoop-hdfs. "DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY" to be moved from DFSConfigKeys.java (hadoop-hdfs) to CommonConfigurationKeys.java (hadoop-common). Any other way? 
* Only one new config is introduced {{yarn.nodemanager.remote-app-log-dfs-client-failover-max-attempts}} considering that Namenode is setup in HA mode. Need to check for other configs like "dfs.client.retry.max.attempts" for non-ha setup. 

Please check whether this is the correct way to handle? If not, please give suggestion. Thanks.

> Log aggregation configured to different namenode can fail fast
> --------------------------------------------------------------
>
>                 Key: YARN-5445
>                 URL: https://issues.apache.org/jira/browse/YARN-5445
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chackaravarthy
>         Attachments: YARN-5445-1.patch
>
>
> Log aggregation is enabled and configured to write applogs to different cluster or different namespace (NN federation). In these cases, would like to have some configs on attempts or retries to fail fast in case the other cluster is completely down.
> Currently it takes default {{dfs.client.failover.max.attempts}} as 15 and hence adding a latency of 2 to 2.5 mins in each container launch (per node manager).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org