You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2013/03/06 01:06:14 UTC

[jira] [Updated] (MAPREDUCE-3688) Need better Error message if AM is killed/throws exception

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated MAPREDUCE-3688:
------------------------------------

    Attachment: mapreduce-3688-h0.23-v01.patch

This has been a pain for our users as well.

I don't think this patch will fly well with the reviewers, but maybe it'll help move the discussion forward. 

I didn't see a good way of communicating the error message to the caller so decided to sacrifice the stdout that current MRAppMaster does not use. 

After the patch, webUI would show

{quote}
Diagnostics:	 Application application_1362527487477_0005 failed 1 times due to AM Container for appattempt_1362527487477_0005_000001 exited with exitCode: 1 due to: Error starting MRAppMaster: org.apache.hadoop.yarn.YarnException: java.io.IOException: Split metadata size exceeded 20. Aborting job job_1362527487477_0005 at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1290) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1146) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1118) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:823) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:121) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1094) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:998) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1273) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1269) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1226) Caused by: java.io.IOException: Split metadata size exceeded 20. Aborting job job_1362527487477_0005 at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1285) ... 16 more .Failing this attempt.. Failing the application.
{quote}

(This patch is based on 0.23)
                
> Need better Error message if AM is killed/throws exception
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-3688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3688
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.1
>            Reporter: David Capwell
>            Assignee: Sandy Ryza
>             Fix For: 0.23.2
>
>         Attachments: mapreduce-3688-h0.23-v01.patch
>
>
> We need better error messages in the UI if the AM gets killed or throws an Exception.
> If the following error gets thrown: 
> java.lang.NumberFormatException: For input string: "9223372036854775807l" // last char is an L
> then the UI should say this exception.  Instead I get the following:
> Application application_1326504761991_0018 failed 1 times due to AM Container for appattempt_1326504761991_0018_000001
> exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira