You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/01/29 21:10:49 UTC

[jira] Commented: (HADOOP-949) reduce hangs after applying HADOOP-248 on 20 node cluster

    [ https://issues.apache.org/jira/browse/HADOOP-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468396 ] 

Nigel Daley commented on HADOOP-949:
------------------------------------

Task tracker log:

... [reduce task launches]
2007-01-27 04:35:49,483 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0005_r_000000_0
2007-01-27 04:35:49,554 INFO org.apache.hadoop.dfs.DistributedFileSystem: Initialized InMemoryFileSystem: ramfs://mapoutput10141941 of size (in bytes): 209715200
2007-01-27 04:35:49,555 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Created an InMemoryFileSystem, uri: ramfs://mapoutput10141941
2007-01-27 04:35:49,575 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 200 map output(s)
2007-01-27 04:35:49,575 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 200 map output location(s)
2007-01-27 04:35:49,576 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Started thread: Map output copy reporter for task task_0005_r_000000_0
2007-01-27 04:35:49,584 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-27 04:35:49,585 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 1 known map output location(s); scheduling...
2007-01-27 04:35:49,585 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 1 of 1 known outputs (0 slow hosts and 0 dup hosts)
2007-01-27 04:35:49,585 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Copying task_0005_m_000002_0 output from tt1.com.
2007-01-27 04:35:49,736 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_m_000000_0 0.0% writing test_io_100@134209536/134217728 ::host = tt2.com
2007-01-27 04:35:49,809 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 done copying task_0005_m_000002_0 output from tt1.com.
2007-01-27 04:35:50,204 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 199 map output(s)
2007-01-27 04:35:50,204 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 199 map output location(s)
2007-01-27 04:35:50,578 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.0016666667% reduce > copy (1 of 200 at 0.00 MB/s) >
... [map outputs get copied]
2007-01-27 04:39:44,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 50 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-27 04:39:44,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 known map output location(s); scheduling...
2007-01-27 04:39:44,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2007-01-27 04:39:44,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 153 map output(s)
2007-01-27 04:39:44,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output location(s)
2007-01-27 04:39:44,703 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.07833333% reduce > copy (47 of 200 at 0.00 MB/s) >
2007-01-27 04:39:45,705 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.07833333% reduce > copy (47 of 200 at 0.00 MB/s) >
2007-01-27 04:39:46,707 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.07833333% reduce > copy (47 of 200 at 0.00 MB/s) >
2007-01-27 04:39:47,708 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.07833333% reduce > copy (47 of 200 at 0.00 MB/s) >
2007-01-27 04:39:48,710 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.07833333% reduce > copy (47 of 200 at 0.00 MB/s) >
2007-01-27 04:39:49,268 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 50 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-27 04:39:49,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 known map output location(s); scheduling...
2007-01-27 04:39:49,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2007-01-27 04:39:49,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 152 map output(s)
2007-01-27 04:39:49,269 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output location(s)
2007-01-27 04:39:49,712 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08% reduce > copy (48 of 200 at 0.00 MB/s) >
2007-01-27 04:39:50,715 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08% reduce > copy (48 of 200 at 0.00 MB/s) >
2007-01-27 04:39:51,718 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08% reduce > copy (48 of 200 at 0.00 MB/s) >
2007-01-27 04:39:52,721 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08% reduce > copy (48 of 200 at 0.00 MB/s) >
2007-01-27 04:39:53,723 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08% reduce > copy (48 of 200 at 0.00 MB/s) >
2007-01-27 04:39:54,270 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 50 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-27 04:39:54,270 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 known map output location(s); scheduling...
2007-01-27 04:39:54,271 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2007-01-27 04:39:54,271 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 151 map output(s)
2007-01-27 04:39:54,271 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output location(s)
2007-01-27 04:39:54,726 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08166666% reduce > copy (49 of 200 at 0.00 MB/s) >
2007-01-27 04:39:55,728 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08166666% reduce > copy (49 of 200 at 0.00 MB/s) >
2007-01-27 04:39:56,730 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08166666% reduce > copy (49 of 200 at 0.00 MB/s) >
2007-01-27 04:39:57,732 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08166666% reduce > copy (49 of 200 at 0.00 MB/s) >
2007-01-27 04:39:58,733 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.08166666% reduce > copy (49 of 200 at 0.00 MB/s) >
2007-01-27 04:39:59,272 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 50 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-27 04:39:59,272 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 known map output location(s); scheduling...
2007-01-27 04:39:59,272 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2007-01-27 04:39:59,273 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output(s)
2007-01-27 04:39:59,273 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output location(s)
2007-01-27 04:39:59,735 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:00,737 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:01,739 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:02,741 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:03,743 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:04,272 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 50 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-27 04:40:04,272 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 known map output location(s); scheduling...
2007-01-27 04:40:04,272 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2007-01-27 04:40:04,745 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:05,748 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:06,751 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:07,754 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-27 04:40:08,757 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
... [The next chunk is repeated forever]
2007-01-28 02:00:54,509 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output(s)
2007-01-28 02:00:54,509 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Need 150 map output location(s)
2007-01-28 02:00:54,513 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 50 new map outputs from jobtracker and 0 map outputs from previous failures
2007-01-28 02:00:54,513 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Got 0 known map output location(s); scheduling...
2007-01-28 02:00:54,513 INFO org.apache.hadoop.mapred.TaskRunner: task_0005_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
2007-01-28 02:00:54,595 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-28 02:00:55,596 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-28 02:00:56,598 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-28 02:00:57,600 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
2007-01-28 02:00:58,602 INFO org.apache.hadoop.mapred.TaskTracker: task_0005_r_000000_0 0.083333336% reduce > copy (50 of 200 at 0.00 MB/s) >
...[The webui shows the reduce as 8.33% complete]



> reduce hangs after applying HADOOP-248 on 20 node cluster
> ---------------------------------------------------------
>
>                 Key: HADOOP-949
>                 URL: https://issues.apache.org/jira/browse/HADOOP-949
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Owen O'Malley
>
> After HADOOP-248 was committed, my benchmarks started to hang.  Running TestDFSIO on 20 nodes, the job's single reduce is hanging.  Annodated logs from the task tracker running the reduce are included in the next comment.  I have reverted the patch using:
>    cd $trunk_dir
>    svn merge -r 500410:500397 .
> and reran some benchmarks.  They all pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.