You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "zhihai xu (JIRA)" <ji...@apache.org> on 2015/03/03 22:17:07 UTC
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due
to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345747#comment-14345747 ]
zhihai xu commented on YARN-2893:
---------------------------------
I find there is another possibility which can also cause this exception for none-secure one: the JobClient corrupted the tokens buffer.
The RM code only check the tokens buffer in RMAppManager#submitApplication for secure one.
{code}
if (UserGroupInformation.isSecurityEnabled()) {
try {
this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
parseCredentials(submissionContext),
submissionContext.getCancelTokensWhenComplete(),
application.getUser());
} catch (Exception e) {
LOG.warn("Unable to parse credentials.", e);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we haven't yet informed the
// scheduler about the existence of the application
assert application.getState() == RMAppState.NEW;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
throw RPCUtil.getRemoteException(e);
}
protected Credentials parseCredentials(
ApplicationSubmissionContext application) throws IOException {
Credentials credentials = new Credentials();
DataInputByteBuffer dibb = new DataInputByteBuffer();
ByteBuffer tokens = application.getAMContainerSpec().getTokens();
if (tokens != null) {
dibb.reset(tokens);
credentials.readTokenStorageStream(dibb);
tokens.rewind();
}
return credentials;
}
{code}
I think we should do the same for none-secure one, so we can fail the application earlier to avoid confusion.
Also I find out a cascading patch to fix the credentials corruption at the jobClient.
https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e
I will update the patch to check the tokens buffer for for none-secure one in RMAppManager#submitApplication.
> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> ------------------------------------------------------------------------------
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Gera Shegalov
> Assignee: zhihai xu
> Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)