You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2007/03/13 04:40:09 UTC

[jira] Resolved: (HADOOP-924) Map task is not getting rescheduled although the corresponding TT got lost

     [ https://issues.apache.org/jira/browse/HADOOP-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-924.
----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.12.1

This looks like a duplicate of HADOOP-1060.

> Map task is not getting rescheduled although the corresponding TT got lost
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-924
>                 URL: https://issues.apache.org/jira/browse/HADOOP-924
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>             Fix For: 0.12.1
>
>
> I encountered this "job hung" situation during one of the sort runs. Two tasks assigned to a TT were never rescheduled although the TT was lost and this led to the job getting stuck forever. This TT was assigned lots of tasks and everyone got rescheduled except these two. Here are the relevant log messages (below the JT logs has been split into two parts to bring out the sequence of events) for one of the tasks.
> JT log:
> ---------
> 2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_020699
> 2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_020699_0' to tip tip_0001_m_020699, for tracker 'foo.com:7020'
> TT log:
> ---------
> 2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_020699_0
> 2007-01-24 10:53:12,180 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_020699_0 0.0% hdfs://foo:50000/user/ddas/somedir/part002444:134217728+134217728
> JT log:
> ---------
> 2007-01-24 11:05:32,409 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'foo.com:7020'
> Looks like there is some race condition. Since only two out of the many tasks never got rescheduled,  could mean that the JT was somehow unaware of the state of this two tasks after it assigned them to the (soon-to-be-lost) TT (did they get added to the relevant tables properly?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.