You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Joe Shaw <jo...@joeshaw.org> on 2008/09/25 23:26:34 UTC

The reduce copier failed

Hi,

I'm trying to build an index using the "index" contrib in Hadoop
0.18.0, but the reduce tasks are consistently failing.

In the output from the "hadoop jar" command, I see messages like this:

08/09/25 14:12:11 INFO mapred.JobClient:  map 27% reduce 4%
08/09/25 14:12:23 INFO mapred.JobClient: Task Id :
attempt_200809180916_0027_r_000007_2, Status : FAILED
java.io.IOException: attempt_200809180916_0027_r_000007_2The reduce
copier failed
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

and eventually failing.

The output from "hadoop job -history" gives me:

Task Summary
============================
Kind    Total   Successful      Failed  Killed  StartTime       FinishTime

Map     57      57              0       0       25-Sep-2008 14:03:07
 25-Sep-2008 14:13:17 (10mins, 9sec)
Reduce  4       0               4       0       25-Sep-2008 14:03:14
 25-Sep-2008 14:13:21 (10mins, 7sec)
============================

and

FAILED REDUCE task list for job_200809180916_0027
TaskId          StartTime       FinishTime      Error
====================================================
task_200809180916_0027_r_000007 25-Sep-2008 14:03:14    25-Sep-2008
14:13:21 (10mins, 7sec)

Grepping in the logs for that task, I see this consistently on the TaskTrackers:

hadoop-jshaw-tasktracker-ars1dev3.log:2008-09-25 14:09:08,685 INFO
org.apache.hadoop.mapred.TaskTracker:
attempt_200809180916_0027_r_000007_1 0.016147636% reduce > copy (14 of
289 at 8.37 MB/s) >
hadoop-jshaw-tasktracker-ars1dev3.log:2008-09-25 14:09:11,904 INFO
org.apache.hadoop.mapred.TaskTracker:
attempt_200809180916_0027_r_000007_1 0.018454442% reduce > copy (16 of
289 at 7.85 MB/s) >
hadoop-jshaw-tasktracker-ars1dev3.log:2008-09-25 14:09:17,337 INFO
org.apache.hadoop.mapred.TaskRunner:
attempt_200809180916_0027_r_000007_1 done; removing files.

As you can see, it apparently is unable to copy the data, but it gives
me absolutely no idea why.  The JobTracker logs also give me no useful
information.

Anybody have an idea what's going on, or how I might go about debugging this?

Thanks,
Joe

Re: The reduce copier failed

Posted by Joe Shaw <jo...@joeshaw.org>.

Hi,

On Fri, Sep 26, 2008 at 2:50 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>> Caused by: org.apache.lucene.index.MergePolicy$MergeException: segment
>> "_bfu exists in external directory yet the MergeScheduler executed the
>> merge in a separate thread
>>       at
>
> Hmm... could this be something specific to
> org.apache.lucene.index.MergePolicy?

It certainly appears that way.  I'm using the Lucene 2.3.2 jar file.
I'll try with a couple different versions and file a JIRA issue if it
seems to be 2.3-specific.

Thanks,
Joe

Re: The reduce copier failed

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

>
> 2008-09-25 17:12:18,250 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200809180916_0027_r_000007_2: Got 2 new map-outputs & number
> of known map outputs is 21
> 2008-09-25 17:12:18,251 WARN org.apache.hadoop.mapred.ReduceTask:
> attempt_200809180916_0027_r_000007_2 Merge of the inmemory files threw
> an exception: java.io.IOException: Intermedate merge failed
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2133)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.run(ReduceTask.java:2064)
> Caused by: org.apache.lucene.index.MergePolicy$MergeException: segment
> "_bfu exists in external directory yet the MergeScheduler executed the
> merge in a separate thread
>        at

Hmm... could this be something specific to  
org.apache.lucene.index.MergePolicy?

Arun

> org 
> .apache 
> .lucene.index.IndexWriter.copyExternalSegments(IndexWriter.java:2362)
>        at  
> org 
> .apache 
> .lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:2307)
>        at  
> org 
> .apache 
> .hadoop 
> .contrib.index.mapred.IntermediateForm.process(IntermediateForm.java: 
> 135)
>        at  
> org 
> .apache 
> .hadoop 
> .contrib 
> .index.mapred.IndexUpdateCombiner.reduce(IndexUpdateCombiner.java:56)
>        at  
> org 
> .apache 
> .hadoop 
> .contrib 
> .index.mapred.IndexUpdateCombiner.reduce(IndexUpdateCombiner.java:38)
>        at org.apache.hadoop.mapred.ReduceTask 
> $ReduceCopier.combineAndSpill(ReduceTask.java:2160)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access 
> $3100(ReduceTask.java:341)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2120)
>        ... 1 more
>
> 2008-09-25 17:12:19,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 14523136 bytes from map-output for
> attempt_200809180916_0027_m_000016_0
> 2008-09-25 17:12:19,087 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200809180916_0027_m_000016_0 -> (41, 10651153) from
> ars1dev6
> 2008-09-25 17:12:19,110 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 14506735 bytes (14506735 raw bytes) into RAM from
> attempt_200809180916_0027_m_000009_0
> 2008-09-25 17:12:19,226 INFO org.apache.hadoop.mapred.ReduceTask:
> Closed ram manager
> 2008-09-25 17:12:19,228 WARN org.apache.hadoop.mapred.TaskTracker:
> Error running child
> java.io.IOException: attempt_200809180916_0027_r_000007_2The reduce
> copier failed
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>        at org.apache.hadoop.mapred.TaskTracker 
> $Child.main(TaskTracker.java:2209)
>
> Thanks,
> Joe

Re: The reduce copier failed

Posted by Joe Shaw <jo...@joeshaw.org>.

Hi again,

Ugh, sorry about the butchered output.

On Thu, Sep 25, 2008 at 5:42 PM, Joe Shaw <jo...@joeshaw.org> wrote:
> Hi,
>
> On Thu, Sep 25, 2008 at 5:32 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>> What did the logs for the task-attempt
>> 'attempt_200809180916_0027_r_000007_2' look like? Did the TIP/Job succeed?
>
> You mean inside userlogs/attempt_blah_blah/syslog?  I didn't know
> about this log file before, thanks!
>
> The last several lines of output are:

... now attached as a text file to the email.

Joe

Re: The reduce copier failed

Posted by Joe Shaw <jo...@joeshaw.org>.

Hi,

On Thu, Sep 25, 2008 at 5:32 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
> What did the logs for the task-attempt
> 'attempt_200809180916_0027_r_000007_2' look like? Did the TIP/Job succeed?

You mean inside userlogs/attempt_blah_blah/syslog?  I didn't know
about this log file before, thanks!

The last several lines of output are:

2008-09-25 17:12:18,250 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200809180916_0027_r_000007_2: Got 2 new map-outputs & number
of known map outputs is 21
2008-09-25 17:12:18,251 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200809180916_0027_r_000007_2 Merge of the inmemory files threw
an exception: java.io.IOException: Intermedate merge failed
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2133)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2064)
Caused by: org.apache.lucene.index.MergePolicy$MergeException: segment
"_bfu exists in external directory yet the MergeScheduler executed the
merge in a separate thread
        at org.apache.lucene.index.IndexWriter.copyExternalSegments(IndexWriter.java:2362)
        at org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:2307)
        at org.apache.hadoop.contrib.index.mapred.IntermediateForm.process(IntermediateForm.java:135)
        at org.apache.hadoop.contrib.index.mapred.IndexUpdateCombiner.reduce(IndexUpdateCombiner.java:56)
        at org.apache.hadoop.contrib.index.mapred.IndexUpdateCombiner.reduce(IndexUpdateCombiner.java:38)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2160)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2120)
        ... 1 more

2008-09-25 17:12:19,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
14523136 bytes from map-output for
attempt_200809180916_0027_m_000016_0
2008-09-25 17:12:19,087 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_200809180916_0027_m_000016_0 -> (41, 10651153) from
ars1dev6
2008-09-25 17:12:19,110 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 14506735 bytes (14506735 raw bytes) into RAM from
attempt_200809180916_0027_m_000009_0
2008-09-25 17:12:19,226 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager
2008-09-25 17:12:19,228 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: attempt_200809180916_0027_r_000007_2The reduce
copier failed
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

Thanks,
Joe

Re: The reduce copier failed

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

On Sep 25, 2008, at 2:26 PM, Joe Shaw wrote:

> Hi,
>
> I'm trying to build an index using the "index" contrib in Hadoop
> 0.18.0, but the reduce tasks are consistently failing.
>

What did the logs for the task-attempt  
'attempt_200809180916_0027_r_000007_2' look like? Did the TIP/Job  
succeed?

Arun

> In the output from the "hadoop jar" command, I see messages like this:
>
> 08/09/25 14:12:11 INFO mapred.JobClient:  map 27% reduce 4%
> 08/09/25 14:12:23 INFO mapred.JobClient: Task Id :
> attempt_200809180916_0027_r_000007_2, Status : FAILED
> java.io.IOException: attempt_200809180916_0027_r_000007_2The reduce
> copier failed
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 
> 2209)
>
> and eventually failing.
>
> The output from "hadoop job -history" gives me:
>
> Task Summary
> ============================
> Kind    Total   Successful      Failed  Killed  StartTime        
> FinishTime
>
> Map     57      57              0       0       25-Sep-2008 14:03:07
> 25-Sep-2008 14:13:17 (10mins, 9sec)
> Reduce  4       0               4       0       25-Sep-2008 14:03:14
> 25-Sep-2008 14:13:21 (10mins, 7sec)
> ============================
>
> and
>
> FAILED REDUCE task list for job_200809180916_0027
> TaskId          StartTime       FinishTime      Error
> ====================================================
> task_200809180916_0027_r_000007 25-Sep-2008 14:03:14    25-Sep-2008
> 14:13:21 (10mins, 7sec)
>
> Grepping in the logs for that task, I see this consistently on the  
> TaskTrackers:
>
> hadoop-jshaw-tasktracker-ars1dev3.log:2008-09-25 14:09:08,685 INFO
> org.apache.hadoop.mapred.TaskTracker:
> attempt_200809180916_0027_r_000007_1 0.016147636% reduce > copy (14 of
> 289 at 8.37 MB/s) >
> hadoop-jshaw-tasktracker-ars1dev3.log:2008-09-25 14:09:11,904 INFO
> org.apache.hadoop.mapred.TaskTracker:
> attempt_200809180916_0027_r_000007_1 0.018454442% reduce > copy (16 of
> 289 at 7.85 MB/s) >
> hadoop-jshaw-tasktracker-ars1dev3.log:2008-09-25 14:09:17,337 INFO
> org.apache.hadoop.mapred.TaskRunner:
> attempt_200809180916_0027_r_000007_1 done; removing files.
>
> As you can see, it apparently is unable to copy the data, but it gives
> me absolutely no idea why.  The JobTracker logs also give me no useful
> information.
>
> Anybody have an idea what's going on, or how I might go about  
> debugging this?
>
> Thanks,
> Joe