You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2007/09/10 12:50:29 UTC
[jira] Commented: (HADOOP-1862) reduces are getting stuck trying to
find map outputs
[ https://issues.apache.org/jira/browse/HADOOP-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526123 ]
Arun C Murthy commented on HADOOP-1862:
---------------------------------------
Hmm... one straw to clutch:
{noformat}
$ cat 1862-event.log | grep task_200709041519_0023_m_001149
OBSOLETE task_200709041519_0023_m_001149_0 http://a.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_0
FAILED task_200709041519_0023_m_001149_0 null
SUCCEEDED task_200709041519_0023_m_001149_1 http://b.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_1
SUCCEEDED task_200709041519_0023_m_001149_2 http://c.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_2
$ cat 1862-event.log | grep task_200709041519_0023_m_001816
OBSOLETE task_200709041519_0023_m_001816_0 http://x.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_0
FAILED task_200709041519_0023_m_001816_0 null
SUCCEEDED task_200709041519_0023_m_001816_1 http://y.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_1
SUCCEEDED task_200709041519_0023_m_001816_2 http://z.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_2
{noformat}
Essentially, in {{JobInProgress.updateTaskStatuses(TaskInProgress, TaskStatus, JobTrackerMetrics)}} the {{TaskCompletionEvent.Status.SUCCEEDED}} is added irrespective of whether the TIP is already complete or not, leading to each reducer seeing 2 {{TaskCompletionEvent.Status.SUCCEEDED}} events as above... clearly the fetch from one of them will fail since either _1 or _2 will be {{KILLED}}, not a happy situation.
Like I said, I'll try to dig deeper, maybe this could help someone beat me to it. *smile*
> reduces are getting stuck trying to find map outputs
> ----------------------------------------------------
>
> Key: HADOOP-1862
> URL: https://issues.apache.org/jira/browse/HADOOP-1862
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.14.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.15.0
>
>
> Some of the reduces have been stuck for hours looking for 137 map outputs. When I look at the job events all 2600 of the maps have succeeded. There have been lots of lost task trackers and shuffle failures. The maps have been run between 1 to 6 times each. I do see some of the events in the task event log are marked OBSOLETE.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.