You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "lujie (JIRA)" <ji...@apache.org> on 2019/05/30 02:04:00 UTC

[jira] [Updated] (YARN-9588) InvalidToken: appattempt_XXX not found in AMRMTokenSecretManager while RM reboot

     [ https://issues.apache.org/jira/browse/YARN-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lujie updated YARN-9588:
------------------------
    Description: 
HI:

while application is success, but before AM unregistered, RM reboot, then one error happens:
{code:java}
2019-05-29 18:55:11,112 ERROR [Thread-76] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while unregistering
org.apache.hadoop.security.token.SecretManager$InvalidToken: appattempt_1559127208490_0001_000001 not found in AMRMTokenSecretManager.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy84.finishApplicationMaster(Unknown Source)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.doUnregistration(RMCommunicator.java:227)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.unregister(RMCommunicator.java:189)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStop(RMCommunicator.java:267)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStop(RMContainerAllocator.java:327)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStop(MRAppMaster.java:985)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1868)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1309)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:668)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:747)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): appattempt_1559127208490_0001_000001 not found in AMRMTokenSecretManager.
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1547)
at org.apache.hadoop.ipc.Client.call(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1392)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy83.finishApplicationMaster(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:92)
... 27 more

{code}
This will make the isLastAMRetry = false;   current AM won't clean up the staging dir:
{code:java}
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Skipping cleaning up the staging dir. assuming AM will be retried.{code}
But NO AM will be retried, because the application is succeeded
{code:java}
2019-05-29 18:55:15,458 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Successfully recovered 1 out of 1 applications
2019-05-29 18:55:15,459 INFO org.apache.hadoop.yarn.server.resourcemanager.security.ProxyCAManager: Recovering CA Certificate and Private Key
2019-05-29 18:55:15,464 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hires OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1559127208490_0001
2019-05-29 18:55:15,472 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1559127208490_0001,name=word count,user=hires,queue=default,state=FINISHED,trackingUrl=http://hadoop11:8088/proxy/application_1559127208490_0001/,appMasterHost=N/A,submitTime=1559127223098,startTime=1559127223239,launchTime=1559127239133,finishTime=1559127288959,finalStatus=SUCCEEDED,memorySeconds=132664,vcoreSeconds=77,preemptedMemorySeconds=132664,preemptedVcoreSeconds=77,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=MAPREDUCE,resourceSeconds=132664 MB-seconds\, 77 vcore-seconds,preemptedResourceSeconds=132664 MB-seconds\, 77 vcore-seconds
2019-05-29 18:55:15,548 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery ended
{code}

  was:
HI:

while application is success, but before AM unregistered, RM reboot, then one error happens:
{code:java}
2019-05-29 18:55:11,112 ERROR [Thread-76] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while unregistering
org.apache.hadoop.security.token.SecretManager$InvalidToken: appattempt_1559127208490_0001_000001 not found in AMRMTokenSecretManager.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy84.finishApplicationMaster(Unknown Source)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.doUnregistration(RMCommunicator.java:227)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.unregister(RMCommunicator.java:189)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStop(RMCommunicator.java:267)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStop(RMContainerAllocator.java:327)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStop(MRAppMaster.java:985)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1868)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1309)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:668)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:747)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): appattempt_1559127208490_0001_000001 not found in AMRMTokenSecretManager.
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1547)
at org.apache.hadoop.ipc.Client.call(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1392)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy83.finishApplicationMaster(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:92)
... 27 more

{code}
This will make the isLastAMRetry = false;   current AM won't clean up the staging dir:
{code:java}
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Skipping cleaning up the staging dir. assuming AM will be retried.
{code}


> InvalidToken: appattempt_XXX not found in AMRMTokenSecretManager while RM reboot
> --------------------------------------------------------------------------------
>
>                 Key: YARN-9588
>                 URL: https://issues.apache.org/jira/browse/YARN-9588
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: lujie
>            Priority: Critical
>
> HI:
> while application is success, but before AM unregistered, RM reboot, then one error happens:
> {code:java}
> 2019-05-29 18:55:11,112 ERROR [Thread-76] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while unregistering
> org.apache.hadoop.security.token.SecretManager$InvalidToken: appattempt_1559127208490_0001_000001 not found in AMRMTokenSecretManager.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
> at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
> at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:94)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy84.finishApplicationMaster(Unknown Source)
> at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.doUnregistration(RMCommunicator.java:227)
> at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.unregister(RMCommunicator.java:189)
> at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStop(RMCommunicator.java:267)
> at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStop(RMContainerAllocator.java:327)
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
> at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStop(MRAppMaster.java:985)
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
> at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
> at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
> at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
> at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1868)
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1309)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:668)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:747)
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): appattempt_1559127208490_0001_000001 not found in AMRMTokenSecretManager.
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1547)
> at org.apache.hadoop.ipc.Client.call(Client.java:1493)
> at org.apache.hadoop.ipc.Client.call(Client.java:1392)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy83.finishApplicationMaster(Unknown Source)
> at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:92)
> ... 27 more
> {code}
> This will make the isLastAMRetry = false;   current AM won't clean up the staging dir:
> {code:java}
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Skipping cleaning up the staging dir. assuming AM will be retried.{code}
> But NO AM will be retried, because the application is succeeded
> {code:java}
> 2019-05-29 18:55:15,458 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Successfully recovered 1 out of 1 applications
> 2019-05-29 18:55:15,459 INFO org.apache.hadoop.yarn.server.resourcemanager.security.ProxyCAManager: Recovering CA Certificate and Private Key
> 2019-05-29 18:55:15,464 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hires OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1559127208490_0001
> 2019-05-29 18:55:15,472 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1559127208490_0001,name=word count,user=hires,queue=default,state=FINISHED,trackingUrl=http://hadoop11:8088/proxy/application_1559127208490_0001/,appMasterHost=N/A,submitTime=1559127223098,startTime=1559127223239,launchTime=1559127239133,finishTime=1559127288959,finalStatus=SUCCEEDED,memorySeconds=132664,vcoreSeconds=77,preemptedMemorySeconds=132664,preemptedVcoreSeconds=77,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=MAPREDUCE,resourceSeconds=132664 MB-seconds\, 77 vcore-seconds,preemptedResourceSeconds=132664 MB-seconds\, 77 vcore-seconds
> 2019-05-29 18:55:15,548 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery ended
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org