You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/03/14 17:08:13 UTC

[jira] [Reopened] (MAPREDUCE-4992) AM hangs in RecoveryService when recovering tasks with speculative attempts

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe reopened MAPREDUCE-4992:
-----------------------------------


This is still occurring in a number of ways:

* If the task attempt that succeeded was attempt 1 but there is no completion event in the history file for attempt 0, it recovers only attempt 0 but is waiting for attempt 1 to complete.
* If two task attempts succeed simultaneously it only recovers attempt 0 but is waiting for attempt 1 to complete.
* If the prior AM attempt was backed up in event processing and launched speculative task attempts *after* a task attempt completed then it ends up waiting on them but they were never launched.
                
> AM hangs in RecoveryService when recovering tasks with speculative attempts
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4992
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4992
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: trunk, 2.0.2-alpha, 0.23.6
>            Reporter: Robert Parker
>            Assignee: Robert Parker
>            Priority: Critical
>             Fix For: 0.23.7, 2.0.5-beta
>
>         Attachments: MAPREDUCE-4992v1.patch, MAPREDUCE-4992v2.patch
>
>
> A job hung in the Recovery Service on an AM restart. There were four map tasks events that were not processed and that prevented the complete task count from reaching zero which exits the recovery service. All four tasks were speculative

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira