You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Bibin A Chundatt (JIRA)" <ji...@apache.org> on 2017/07/12 08:36:00 UTC

[jira] [Comment Edited] (YARN-6803) AM registration could fail if event processing is delayed.

    [ https://issues.apache.org/jira/browse/YARN-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083640#comment-16083640 ] 

Bibin A Chundatt edited comment on YARN-6803 at 7/12/17 8:35 AM:
-----------------------------------------------------------------

{quote}
 Wondering if we could get a REGISTERED event before we get a LAUNCHED event if the launch processing is particularly slow in getting the event posted.
{quote}
IIUC as [~sunilg] mentioned response delay from nodemanager could cause REGISTERED event before we get a LAUNCHED.
IMHO but the probability would be very rare.

As per YARN-1214 if client token is send to client before credentials are saved is problematic.
So the fix could be to move the client masterkey setting to {{AttemptStoredTransition}} before launch call and for UnmanagedAM in {{UnmanagedAMAttemptSavedTransition}}.

 [~jlowe] is it ok to continue discussion in this JIRA or should be continue in YARN-3260? 


was (Author: bibinchundatt):
{quote}
 Wondering if we could get a REGISTERED event before we get a LAUNCHED event if the launch processing is particularly slow in getting the event posted.
{quote}
IIUC as [~sunilg] mentioned response delay from nodemanager could cause REGISTERED event before we get a LAUNCHED.
IMHO but the probability would be very rare.

For client TOKEN from YARN-1214 if client token is send to client before credentials are saved is problematic.
So the fix could be to move the client masterkey setting to {{AttemptStoredTransition}} before launch call and for UnmanagedAM in {{UnmanagedAMAttemptSavedTransition}}.

 [~jlowe] is it ok to continue discussion in this JIRA or should be continue in YARN-3260? 

> AM registration could fail if event processing is delayed.
> ----------------------------------------------------------
>
>                 Key: YARN-6803
>                 URL: https://issues.apache.org/jira/browse/YARN-6803
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-6803.001.patch
>
>
> Steps to reproduce
> #  Submit application
> #  Delay application attempt AMLauch event processing
> #  Make AM register before AM Launch event is fired
> {{DefaultAMSProcessor#registerApplicationMaster}} client token 
> {code}
>     if (UserGroupInformation.isSecurityEnabled()) {
>       LOG.info("Setting client token master key");
>       response.setClientToAMTokenMasterKey(java.nio.ByteBuffer.wrap(
>           getRmContext().getClientToAMTokenSecretManager()
>           .getMasterKey(applicationAttemptId).getEncoded()));
>     }
> {code}
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException: java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.registerApplicationMaster(DefaultAMSProcessor.java:130)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:217)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
> 	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:177)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:121)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:978)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1280)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1733)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1729)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1660)
> {code}
> *Root Cause*
> {{ClientToAMTokenSecretManagerInRM}} token master key is set only after AMLauch event is fired.
> {{AMLaunchedTransition}}
> {code}
>       // register the ClientTokenMasterKey after it is saved in the store,
>       // otherwise client may hold an invalid ClientToken after RM restarts.
>       if (UserGroupInformation.isSecurityEnabled()) {
>         appAttempt.rmContext.getClientToAMTokenSecretManager()
>             .registerApplication(appAttempt.getAppAttemptId(),
>             appAttempt.getClientTokenMasterKey());
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org