You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by John Heidemann <jo...@isi.edu> on 2007/07/05 20:10:31 UTC

"Broken pipe" in hadoop streaming...looking for debugging hints

I'm running hadoop streaming from svn (version 552930, reasonably
recent).  My map/reduce job maps ~1M records, but then a few reduces
succeed and many fail, eventually terminating the job unsuccessfully.
I'm looking for some debugging hints.


The failures are all "broken pipe":

java.io.IOException: R/W/S=1/0/0 in:0=1/342 [rec/s] out:0=0/342 [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=hadoop
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Thu Jul 05 10:46:28 PDT 2007
Broken pipe
	at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:91)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:323)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1763)


The log looks reasonable (to my naive eyes):

2007-07-05 10:46:24,248 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Need 2 map output(s)
2007-07-05 10:46:25,105 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Got 3 new map outputs from tasktracker and 0 map outputs from previous failures
2007-07-05 10:46:25,105 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Got 6 known map output location(s); scheduling...
2007-07-05 10:46:25,106 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Scheduled 5 of 6 known outputs (0 slow hosts and 1 dup hosts)
2007-07-05 10:46:25,106 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying task_0004_m_000496_1 output from rim.isi.edu.
2007-07-05 10:46:25,106 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying task_0004_m_000496_0 output from bay.isi.edu.
2007-07-05 10:46:25,107 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying task_0004_m_000495_0 output from air.isi.edu.
2007-07-05 10:46:25,108 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying task_0004_m_000495_1 output from hey.isi.edu.
2007-07-05 10:46:25,109 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Ignoring obsolete copy result for Map Task: task_0004_m_000500_1 from host: sky.isi.edu
2007-07-05 10:46:25,111 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 done copying task_0004_m_000496_0 output from bay.isi.edu.
2007-07-05 10:46:25,148 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 done copying task_0004_m_000495_0 output from air.isi.edu.
2007-07-05 10:46:25,184 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Ignoring obsolete copy result for Map Task: task_0004_m_000496_1 from host: rim.isi.edu
2007-07-05 10:46:25,184 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Ignoring obsolete copy result for Map Task: task_0004_m_000495_1 from host: hey.isi.edu
2007-07-05 10:46:25,185 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying of all map outputs complete. Initiating the last merge on the remaining files in ramfs://mapoutput24529889
2007-07-05 10:46:27,558 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Merge of the 503 files in InMemoryFileSystem complete. Local file is /usr/local/hadoop/hadoop-hadoop/mapred/local/task_0004_r_000003_0/map_332.out
2007-07-05 10:46:28,528 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:0=1/342 [rec/s] out:0=0/342 [rec/s]
2007-07-05 10:46:28,575 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
2007-07-05 10:46:28,601 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2007-07-05 10:46:28,612 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2007-07-05 10:46:28,794 WARN org.apache.hadoop.mapred.TaskTracker: Error running child



To try and figure out what's up, I tried to tap IO around my reducer,
making it this shell script:

	#!/bin/sh
	tee /tmp/reduce.$$.in | $HOME/reduce_command 16 |tee /tmp/reduce.$$.out
	cat /tmp/reduce.$$.out
	exit 0

And I get 0-length .in and .out files.

This doesn't seem right.  Hadoop streaming should give my reducing
program some input, right?  Without input I don't see that I can do
much.  (My guess is I don't get any input and it eventually timesout and
kills me.)


(When I go look in the raw reduce directories, e.g.,
task_0004_m_000455_0, I do see reasonable looking stuff, including my input.)



The other strange thing is I don't get 100% reduce failures, but maybe
490/503 fail.


Any suggestions?

   -John Heidemann

Re: "Broken pipe" in hadoop streaming...looking for debugging hints

Posted by John Heidemann <jo...@isi.edu>.

On Thu, 05 Jul 2007 11:10:31 PDT, John Heidemann wrote: 
>
>I'm running hadoop streaming from svn (version 552930, reasonably
>recent).  My map/reduce job maps ~1M records, but then a few reduces
>succeed and many fail, eventually terminating the job unsuccessfully.
>I'm looking for some debugging hints.
>
>
>The failures are all "broken pipe":
>
>java.io.IOException: R/W/S=1/0/0 in:0=1/342 [rec/s] out:0=0/342 [rec/s]
>minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
>HOST=null
>USER=hadoop
>HADOOP_USER=null
>last Hadoop input: |null|
>last tool output: |null|
>Date: Thu Jul 05 10:46:28 PDT 2007
>Broken pipe
>	at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:91)
>	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:323)
>	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1763)
>
>...
>
>
>The other strange thing is I don't get 100% reduce failures, but maybe
>490/503 fail.
>

I found the problem... yes, "pipe broken" is the error message you get
when you your reduce task fails to start (in my case, due to a missing
library).

A bug in my code is clearly my fault.

But I might suggest it would be nice for hadoop streaming to check the
exit code of the map and reduce tasks, to provide, say, a more
informative error message.  (I'll see if I can fix this.)

The other odd thing is the ~10 reduces that succeed apparently succeed
because they were reducers with no input, which apparently hadoop
streaming silently turns into successful no output.

   -John