You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "zhihai xu (JIRA)" <ji...@apache.org> on 2015/03/03 22:17:07 UTC

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

    [ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345747#comment-14345747 ] 

zhihai xu commented on YARN-2893:
---------------------------------

I find there is another possibility which can also cause this exception for none-secure one: the JobClient corrupted the tokens buffer.
The RM code only check the tokens buffer in RMAppManager#submitApplication for secure one.
{code}
    if (UserGroupInformation.isSecurityEnabled()) {
      try {
        this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
            parseCredentials(submissionContext),
            submissionContext.getCancelTokensWhenComplete(),
            application.getUser());
      } catch (Exception e) {
        LOG.warn("Unable to parse credentials.", e);
        // Sending APP_REJECTED is fine, since we assume that the
        // RMApp is in NEW state and thus we haven't yet informed the
        // scheduler about the existence of the application
        assert application.getState() == RMAppState.NEW;
        this.rmContext.getDispatcher().getEventHandler()
          .handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
        throw RPCUtil.getRemoteException(e);
      }

  protected Credentials parseCredentials(
      ApplicationSubmissionContext application) throws IOException {
    Credentials credentials = new Credentials();
    DataInputByteBuffer dibb = new DataInputByteBuffer();
    ByteBuffer tokens = application.getAMContainerSpec().getTokens();
    if (tokens != null) {
      dibb.reset(tokens);
      credentials.readTokenStorageStream(dibb);
      tokens.rewind();
    }
    return credentials;
  }
{code}
I think we should do the same for none-secure one, so we can fail the application earlier to avoid confusion.

Also I find out a cascading patch to fix the credentials corruption at the jobClient.
https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e

I will update the patch to check the  tokens buffer for for none-secure one in RMAppManager#submitApplication.

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> ------------------------------------------------------------------------------
>
>                 Key: YARN-2893
>                 URL: https://issues.apache.org/jira/browse/YARN-2893
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: zhihai xu
>         Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)