You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Steve Lewis <lo...@gmail.com> on 2010/06/16 18:15:55 UTC

Problem with Reducer emitting a different Key than Mapper

Problem with Reducer emitting a different Key than Mapper
IO have the FOllowing code where the Mapper emits a custom Key and the
reducer isa expected to emit text

Using Hadoop 0.2 on a local instance  I asj the reducer to write Text,Text -
thiew is even what the IDE says I should do
and what I get is the exception below - any bright ideas???

public class PartitionReducer extends Reducer<GenonePartitionKey, Text,
Text, Text>
{

  /**
     * This method is called once for each key. Most applications will
define
     * their reduce class by overriding this method. The default
implementation
     * is an identity function.
     */
    @Override
    protected void reduce(GenonePartitionKey key, Iterable<Text> values,
Context context)
            throws IOException, InterruptedException
    {
          context.write(new Text("Foo"),new Text("Bar"));

     ....
 }

 }


10/06/16 09:08:45 INFO mapred.MapTask: Starting flush of map output
10/06/16 09:08:45 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1123)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.io.IOException: wrong key class: class
org.apache.hadoop.io.Text is not class
org.systemsbiology.hadoop.GenonePartitionKey
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
at
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.systemsbiology.hadoop.PartitionReducer.reduce(PartitionReducer.java:165)
at
org.systemsbiology.hadoop.PartitionReducer.reduce(PartitionReducer.java:23)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.systemsbiology.hadoop.PartitionReducer.run(PartitionReducer.java:259)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)


-- 
Steven M. Lewis PhD
Institute for Systems Biology
Seattle WA

Re: Closable Map

Posted by Aaron Kimball <aa...@cloudera.com>.
The Mapper class has a close() method. You can put arbitrary code in there.

You can even save a reference to the OutputCollector during the map()
method, and then emit "final" key, value pairs after the entire InputSplit
has been processed.

- Aaron

On Thu, Jun 17, 2010 at 5:22 AM, Mohamed Riadh Trad
<Mo...@inria.fr>wrote:

> I think that there is a closable() class on Java, what about hadoop pipes?
> Le 17 juin 2010 à 14:21, Mohamed Riadh Trad a écrit :
>
> > Hi all,
> >
> > Is it possible to put instructions to be executed after all the record
> reader have been proceeded within a map?
> >
> > (on close() function??)
> >
> > Regards.
>
>

Re: Closable Map

Posted by Mohamed Riadh Trad <Mo...@inria.fr>.
I think that there is a closable() class on Java, what about hadoop pipes?
Le 17 juin 2010 à 14:21, Mohamed Riadh Trad a écrit :

> Hi all,
> 
> Is it possible to put instructions to be executed after all the record reader have been proceeded within a map? 
> 
> (on close() function??)
> 
> Regards.


Closable Map

Posted by Mohamed Riadh Trad <Mo...@inria.fr>.
Hi all,

Is it possible to put instructions to be executed after all the record reader have been proceeded within a map? 

(on close() function??)

Regards.

Re: Problem with Reducer emitting a different Key than Mapper

Posted by Steve Lewis <lo...@gmail.com>.
Yes - that was the problem - thanks

On Wed, Jun 16, 2010 at 9:26 AM, Alex Kozlov <al...@cloudera.com> wrote:

> Hi Steve, did you do
>
> job.setOutputKeyClass(Text.class);
> job.setOutputValueClass(Text.class);
>
> ?
>
> Alex K
>
>
> On Wed, Jun 16, 2010 at 9:15 AM, Steve Lewis <lo...@gmail.com>wrote:
>
>> Problem with Reducer emitting a different Key than Mapper
>> IO have the FOllowing code where the Mapper emits a custom Key and the
>> reducer isa expected to emit text
>>
>> Using Hadoop 0.2 on a local instance  I asj the reducer to write Text,Text
>> - thiew is even what the IDE says I should do
>> and what I get is the exception below - any bright ideas???
>>
>> public class PartitionReducer extends Reducer<GenonePartitionKey, Text,
>> Text, Text>
>> {
>>
>>   /**
>>      * This method is called once for each key. Most applications will
>> define
>>      * their reduce class by overriding this method. The default
>> implementation
>>      * is an identity function.
>>       */
>>     @Override
>>     protected void reduce(GenonePartitionKey key, Iterable<Text> values,
>> Context context)
>>             throws IOException, InterruptedException
>>     {
>>           context.write(new Text("Foo"),new Text("Bar"));
>>
>>      ....
>>  }
>>
>>  }
>>
>>
>> 10/06/16 09:08:45 INFO mapred.MapTask: Starting flush of map output
>> 10/06/16 09:08:45 WARN mapred.LocalJobRunner: job_local_0001
>> java.io.IOException: Spill failed
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1123)
>>  at
>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: java.io.IOException: wrong key class: class
>> org.apache.hadoop.io.Text is not class
>> org.systemsbiology.hadoop.GenonePartitionKey
>> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
>>  at
>> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
>> at
>> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
>>  at
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>> at
>> org.systemsbiology.hadoop.PartitionReducer.reduce(PartitionReducer.java:165)
>>  at
>> org.systemsbiology.hadoop.PartitionReducer.reduce(PartitionReducer.java:23)
>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>  at
>> org.systemsbiology.hadoop.PartitionReducer.run(PartitionReducer.java:259)
>> at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
>>  at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
>>
>>
>> --
>> Steven M. Lewis PhD
>> Institute for Systems Biology
>> Seattle WA
>>
>
>


-- 
Steven M. Lewis PhD
Institute for Systems Biology
Seattle WA

Re: Problem with Reducer emitting a different Key than Mapper

Posted by Alex Kozlov <al...@cloudera.com>.
Hi Steve, did you do

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

?

Alex K

On Wed, Jun 16, 2010 at 9:15 AM, Steve Lewis <lo...@gmail.com> wrote:

> Problem with Reducer emitting a different Key than Mapper
> IO have the FOllowing code where the Mapper emits a custom Key and the
> reducer isa expected to emit text
>
> Using Hadoop 0.2 on a local instance  I asj the reducer to write Text,Text
> - thiew is even what the IDE says I should do
> and what I get is the exception below - any bright ideas???
>
> public class PartitionReducer extends Reducer<GenonePartitionKey, Text,
> Text, Text>
> {
>
>   /**
>      * This method is called once for each key. Most applications will
> define
>      * their reduce class by overriding this method. The default
> implementation
>      * is an identity function.
>      */
>     @Override
>     protected void reduce(GenonePartitionKey key, Iterable<Text> values,
> Context context)
>             throws IOException, InterruptedException
>     {
>           context.write(new Text("Foo"),new Text("Bar"));
>
>      ....
>  }
>
>  }
>
>
> 10/06/16 09:08:45 INFO mapred.MapTask: Starting flush of map output
> 10/06/16 09:08:45 WARN mapred.LocalJobRunner: job_local_0001
> java.io.IOException: Spill failed
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1123)
>  at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.io.IOException: wrong key class: class
> org.apache.hadoop.io.Text is not class
> org.systemsbiology.hadoop.GenonePartitionKey
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
>  at
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
> at
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
>  at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.systemsbiology.hadoop.PartitionReducer.reduce(PartitionReducer.java:165)
>  at
> org.systemsbiology.hadoop.PartitionReducer.reduce(PartitionReducer.java:23)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>  at
> org.systemsbiology.hadoop.PartitionReducer.run(PartitionReducer.java:259)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
>
>
> --
> Steven M. Lewis PhD
> Institute for Systems Biology
> Seattle WA
>