You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/06/11 18:33:02 UTC

[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

    [ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027979#comment-14027979 ] 

Jason Lowe commented on YARN-2147:
----------------------------------

For example, here's a sample log from a client submitting a job that failed:

{noformat}
2014-05-14 10:36:16,111 [JobControl] INFO org.apache.hadoop.mapred.ResourceMgrDelegate  - Submitted application application_1394826486018_9924515 to ResourceManager at xx/xx:xx
2014-05-14 10:36:16,116 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter  - Cleaning up the staging area /user/xx/.staging/job_1394826486018_9924515
2014-05-14 10:36:16,117 [JobControl] ERROR org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException as:xx (auth:SIMPLE) cause:java.io.IOException: Failed to run job : Read timed out
2014-05-14 10:36:16,118 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob  - xx got an error while submitting 
java.io.IOException: Failed to run job : Read timed out
                at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
                at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:410)
                at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
                at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
                at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
                at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:601)
                at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
                at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
{noformat}

All the user sees is a read timeout but no details as to where it was connecting or what service was involved.  Was this a timeout connecting to the RM?  A timeout on the RM side?  Something else entirely?  Hard to tell from just "Read timed out".  Looking at the exception logged at the RM side the full stacktrace shows that it was timing out trying to grab a delegation token from a remote server for webhdfs.  Those kinds of details need to be conveyed back to the client, either via the full stacktrace from the RM exception or via a more informative exception message when delegation token renewal fails during app submission.

> client lacks delegation token exception details when application submit fails
> -----------------------------------------------------------------------------
>
>                 Key: YARN-2147
>                 URL: https://issues.apache.org/jira/browse/YARN-2147
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Priority: Minor
>
> When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error.  Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)