You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andreas Kostyrka (JIRA)" <ji...@apache.org> on 2008/08/07 09:07:44 UTC
[jira] Updated: (HADOOP-3915) reducers hang, jobtracker loosing completely track of them.

     [ https://issues.apache.org/jira/browse/HADOOP-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Kostyrka updated HADOOP-3915:
-------------------------------------

    Attachment: hadoop-hadoop-jobtracker-ec2-67-202-58-97.compute-1.amazonaws.com.log

The job tracker log, you can see the attempts that I made to use hadoop job -kill-task

Furthermore, you'll notice that the reducers in question get started, but then disappear:

andreas@andi-lap:/tmp/bug$ grep tip_200808070013_0001_r_000020 hadoop-hadoop-jobtracker-ec2-67-202-58-97.compute-1.amazonaws.com.log
2008-08-07 00:15:07,184 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200808070013_0001_r_000020_0' to tip tip_200808070013_0001_r_000020, for tracker 'tracker_ec2-75-101-217-112.compute-1.amazonaws.com:localhost/127.0.0.1:53783'
2008-08-07 05:51:35,978 INFO org.apache.hadoop.mapred.JobTracker: Kill task attempt failed since task tip_200808070013_0001_r_000020 was not found

The last mapper was finished around or before 3:30 GMT (if I did not overlook one),
and the reducers produced a long repetition of:

2008-08-07 07:02:39,191 INFO org.apache.hadoop.mapred.ReduceTask: task_200808070013_0001_r_000013_0 Got 0 known map output location(s); scheduling...
2008-08-07 07:02:39,191 INFO org.apache.hadoop.mapred.ReduceTask: task_200808070013_0001_r_000013_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2008-08-07 07:02:44,200 INFO org.apache.hadoop.mapred.ReduceTask: task_200808070013_0001_r_000013_0 Need 67 map output(s)
2008-08-07 07:02:44,200 INFO org.apache.hadoop.mapred.ReduceTask: task_200808070013_0001_r_000013_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures


> reducers hang, jobtracker loosing completely track of them.
> -----------------------------------------------------------
>
>                 Key: HADOOP-3915
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3915
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.1
>         Environment: EC2, Debian Etch  (but not the ec2-contrib stuff)
> streaming.jar
>            Reporter: Andreas Kostyrka
>             Fix For: 0.17.2
>
>         Attachments: hadoop-hadoop-jobtracker-ec2-67-202-58-97.compute-1.amazonaws.com.log
>
>
> I just noticed the following curious situation:
> -) 18 of 22 reducers are waiting for 3 hours or so with 0.01MB/s and no progress.
> -) hadoop job -kill-task does not work on the ids shown
> -) killing all reduce work tasks (the spawned Python processes, not java TaskTracker$Child) gets completely ignored by the JobTracker, the jobtracker shows them still as running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.