You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Yevhenii Semenov (JIRA)" <ji...@apache.org> on 2016/12/14 17:20:58 UTC

[jira] [Commented] (YARN-3112) AM restart and keep containers from previous attempts, then new container launch failed

    [ https://issues.apache.org/jira/browse/YARN-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748904#comment-15748904 ] 

Yevhenii Semenov commented on YARN-3112:
----------------------------------------

[~xtchenhui],  thanks for you investigation and fix! 

I get a similar issue when I kill AM process by {noformat}kill -9 process_id{noformat} and RM recovers it. Not sure that I'm dealing with the same problem (root cause), but your fix helps me too. However, I would like to clarify one important thing. According to the *"Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2"*: 

{quote}
As for network optimization, NMTokens are not sent to the ApplicationMasters for each and every allocated container, but only for the first time or if NMTokens have to be invalidated due to the rollover of the underlying master key
{quote}

If you clear node set in _"pullNewlyAllocatedContainersAndNMTokens"_ then RM generates new tokens for every allocated container. As for me, the fix may cause a regression for network optimization. What do you think about it? 

I'm going to investigate the issue too. I will update the Jira if I find something interesting.

> AM restart and keep containers from previous attempts, then new container launch failed
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3112
>                 URL: https://issues.apache.org/jira/browse/YARN-3112
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications, resourcemanager
>    Affects Versions: 2.6.0
>         Environment: in real linux cluster
>            Reporter: Jack Chen
>
> This error is very similar to YARN-1795, YARN-1839, but i have check the solution of those jira, the patches are already included in my version. I think this error is caused by the different NMTokens between old and new appattempts. New AM has inherited the old tokens from previous AM according to my configuration (keepContainers=true), so the token for new containers are replaced by the old one in the NMTokenCache.
> {noformat}
> 206 2015-01-29 10:04:49,603 ERROR [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for      container_1422546145900_0001_02_000002 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for ixk02:47625
>  207 ›   at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProt     ocolProxy.java:256)
>  208 ›   at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtoc     olProxy.java:246)
>  209 ›   at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:132)
>  210 ›   at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:401)
>  211 ›   at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>  212 ›   at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:367)
>  213 ›   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  214 ›   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  215 ›   at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org