You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2014/06/13 08:02:01 UTC

[jira] [Updated] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhijie Shen updated MAPREDUCE-5924:
-----------------------------------

    Attachment: MAPREDUCE-5924.1.patch

Create a patch with option (1) to fix the problem quickly. And we should file a follow up ticket to go through the \@AtMostOnce protocol APIs of MR, and make them use the retrycache. Once this is done, we can default this quick fix.

> Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5924
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: MAPREDUCE-5924.1.patch
>
>
> Post the issue on behalf of [~yeshavora]:
> The Sort job over 1GB data failed with below error
> {code}
> 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1402304714683_0002 (auth:SIMPLE)
> 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1402304714683_0002_r_000015_1000
> 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1402304714683_0002_r_000015_1000
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:722)
> 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1402304714683_0002Job Transitioned from RUNNING to ERROR
> {code}
> The JobHistory Url prints job state = ERROR



--
This message was sent by Atlassian JIRA
(v6.2#6252)