You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2012/10/26 02:05:12 UTC

[jira] [Updated] (MAPREDUCE-4748) Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4748:
----------------------------------

    Attachment: MAPREDUCE-4748.patch

Simple patch to ignore T_ATTEMPT_SUCCEEDED, T_KILL, and T_ATTEMPT_COMMIT_PENDING at SUCCEEDED and keep the job from abruptly ending in error.

I'm a bit worried about the bookkeeping wrt. task.finishedAttempts and task.numberUncompletedAttempts.  Current patch matches the bookkeeping behavior for T_ATTEMPT_KILLED or T_ATTEMPT_FAILED when we're effectively ignoring the event.  However I'm wondering if this could lead to corner cases during KILL_WAIT like those reported in MAPREDUCE-4745.

It looks like TaskAttempt will report T_ATTEMPT_KILLED after it succeeded but only for map tasks.  We don't want to double-count in that case, but if a kill of the TaskAttempt doesn't report it was killed it seems like we could miss some bookeeping if we just ignore bookkeeping when we see an attempt redundantly succeeded.  Thoughts?
                
> Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
> -----------------------------------------------
>
>                 Key: MAPREDUCE-4748
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4748
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-4748.patch
>
>
> We saw this happen when running a large pig script.
> {noformat}
> 2012-10-23 22:45:24,986 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event at current state for task_1350837501057_21978_m_040453
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Speculative execution was enabled, and that task did speculate so it looks like this is an error in the state machine either between the task attempts or just within that single task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira