You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Kunsheng Chen <ha...@yahoo.com> on 2008/08/18 22:32:16 UTC

OK to remove conf.setCombinerClass(myReduce.class)?

Hello, everyone. 

I am doing  example  WordCount and have to remove conf.setCombinerClass(myReduce.class) in main method in order to have maper and reduce run with different types of output collector.  

I am not sure whether it cause the problem for the perfomance.  Please let me know if it does.   Any idea is appreciated.

Re: OK to remove conf.setCombinerClass(myReduce.class)?

Posted by Kunsheng Chen <ha...@yahoo.com>.

Thanks, Ted!  You are absolutely right. 

I took your idea and wrote a combiner class that simply takes IntWritable as output value and it works like a charm!   

Also thanks for your correction in my words, I meant to say 'notes' as what you said.

Best,

Kunsheng

--- On Tue, 8/19/08, Ted Dunning <te...@gmail.com> wrote:
From: Ted Dunning <te...@gmail.com>
Subject: Re: OK to remove conf.setCombinerClass(myReduce.class)?
To: keyek@yahoo.com
Cc: core-dev@hadoop.apache.org
Date: Tuesday, August 19, 2008, 4:04 AM

What are the type signatures of InDegreeMap and InDegreeReduce?

Remember that the input to your combiner has to be compatible with the
output of the mapper and the output of the combiner has to be compatible
with the input of the reducer.  If the reducer has different types for the
input or output, then you can't use that reducer as a combiner.

The reducer == combiner situation works with word counting because the input
and output types for the reducer are both integers.  In your case, it looks
your reducer takes IntWritable and produces Text.  That means that the
output of the combiner will not be compatible with the input of the reducer..


Did you perhaps mean for the output of the reducer to be IntWritable?

Also, I think that the word "denote" does not mean what you intend
here.
Your meaning is clear, but your usage could lead to confusion in other
situations.

On Mon, Aug 18, 2008 at 2:57 PM, Kunsheng Chen <ke...@yahoo.com> wrote:

> Thanks for your reply. I need to set up different types for output inside
> Map and Reduce, those methods are inside main method: (InDegreeReduce and
> InDegreeMap implements the interfaces)
>
>
>       conf.setMapOutputKeyClass(Text.class);
>       conf.setMapOutputValueClass(IntWritable.class);
>
>       conf.setOutputKeyClass(Text.class);
>       conf.setOutputValueClass(Text.class);
>
>       conf.setMapperClass(InDegreeMap.class);
>
>       //  I have to denote the following line otherwise the exception come
> out when running
>      // conf.setCombinerClass(InDegreeReduce.class);
>
>       conf.setReducerClass(InDegreeReduce.class);
>
>
> I have to denote 'conf.setCombinerClass(...)' otherwise it reminds
me
> errors as below when running (compiled successfully)
>
> ------------------------------------------
>
> 08/08/18 17:45:52 INFO mapred.JobClient: Task Id :
> task_200808161218_0077_m_000001_0, Status : FAILED
> java.io.IOException: wrong value class: org.apache.hadoop.io.Text is not
> class org.apache.hadoop.io.IntWritable
>        at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998)
>        at
>
org.apache.hadoop.mapred.MapTask$CombineOutputCollector.collect(MapTask.java:1083)
>        at DFS_InDegree$InDegreeReduce.reduce(DFS_InDegree.java:77)
>        at DFS_InDegree$InDegreeReduce.reduce(DFS_InDegree.java:52)
>        at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:876)
>        at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:782)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:694)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>        at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>
> --------------------------------------------
>
> I think there are something happen when the combiner try to handle map and
> reduce but don't know what exactly it is.  thanks for your replay
again.
>
> Please let me know if you have any idea.
>
> Best,
>
> Keye
>
> --- On Mon, 8/18/08, Ted Dunning <te...@gmail.com> wrote:
>
> > From: Ted Dunning <te...@gmail.com>
> > Subject: Re: OK to remove conf.setCombinerClass(myReduce.class)?
> > To: core-dev@hadoop.apache.org, hadoopchan@yahoo.com
> > Date: Monday, August 18, 2008, 8:57 PM
> > This doesn't sound right.
> >
> > There are two reasons.
> >
> > First, the combiner is critical for good performance on
> > word count
> > applications.
> >
> > Secondly, having a combiner should not prevent having a
> > different kind of
> > collector.  The combiner should just look like the reducer.
> >
> > Why do you think that having the combiner is causing your
> > problem?
> >
> > On Mon, Aug 18, 2008 at 1:32 PM, Kunsheng Chen
> > <ha...@yahoo.com> wrote:
> >
> > > Hello, everyone.
> > >
> > > I am doing  example  WordCount and have to remove
> > > conf.setCombinerClass(myReduce.class) in main method
> > in order to have maper
> > > and reduce run with different types of output
> > collector.
> > >
> > > I am not sure whether it cause the problem for the
> > perfomance.  Please let
> > > me know if it does.   Any idea is appreciated.
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > ted
>
>
>
>
>


-- 
ted

Re: OK to remove conf.setCombinerClass(myReduce.class)?

Posted by Ted Dunning <te...@gmail.com>.

What are the type signatures of InDegreeMap and InDegreeReduce?

Remember that the input to your combiner has to be compatible with the
output of the mapper and the output of the combiner has to be compatible
with the input of the reducer.  If the reducer has different types for the
input or output, then you can't use that reducer as a combiner.

The reducer == combiner situation works with word counting because the input
and output types for the reducer are both integers.  In your case, it looks
your reducer takes IntWritable and produces Text.  That means that the
output of the combiner will not be compatible with the input of the reducer.


Did you perhaps mean for the output of the reducer to be IntWritable?

Also, I think that the word "denote" does not mean what you intend here.
Your meaning is clear, but your usage could lead to confusion in other
situations.

On Mon, Aug 18, 2008 at 2:57 PM, Kunsheng Chen <ke...@yahoo.com> wrote:

> Thanks for your reply. I need to set up different types for output inside
> Map and Reduce, those methods are inside main method: (InDegreeReduce and
> InDegreeMap implements the interfaces)
>
>
>       conf.setMapOutputKeyClass(Text.class);
>       conf.setMapOutputValueClass(IntWritable.class);
>
>       conf.setOutputKeyClass(Text.class);
>       conf.setOutputValueClass(Text.class);
>
>       conf.setMapperClass(InDegreeMap.class);
>
>       //  I have to denote the following line otherwise the exception come
> out when running
>      // conf.setCombinerClass(InDegreeReduce.class);
>
>       conf.setReducerClass(InDegreeReduce.class);
>
>
> I have to denote 'conf.setCombinerClass(...)' otherwise it reminds me
> errors as below when running (compiled successfully)
>
> ------------------------------------------
>
> 08/08/18 17:45:52 INFO mapred.JobClient: Task Id :
> task_200808161218_0077_m_000001_0, Status : FAILED
> java.io.IOException: wrong value class: org.apache.hadoop.io.Text is not
> class org.apache.hadoop.io.IntWritable
>        at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998)
>        at
> org.apache.hadoop.mapred.MapTask$CombineOutputCollector.collect(MapTask.java:1083)
>        at DFS_InDegree$InDegreeReduce.reduce(DFS_InDegree.java:77)
>        at DFS_InDegree$InDegreeReduce.reduce(DFS_InDegree.java:52)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:876)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:782)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:694)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>        at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>
> --------------------------------------------
>
> I think there are something happen when the combiner try to handle map and
> reduce but don't know what exactly it is.  thanks for your replay again.
>
> Please let me know if you have any idea.
>
> Best,
>
> Keye
>
> --- On Mon, 8/18/08, Ted Dunning <te...@gmail.com> wrote:
>
> > From: Ted Dunning <te...@gmail.com>
> > Subject: Re: OK to remove conf.setCombinerClass(myReduce.class)?
> > To: core-dev@hadoop.apache.org, hadoopchan@yahoo.com
> > Date: Monday, August 18, 2008, 8:57 PM
> > This doesn't sound right.
> >
> > There are two reasons.
> >
> > First, the combiner is critical for good performance on
> > word count
> > applications.
> >
> > Secondly, having a combiner should not prevent having a
> > different kind of
> > collector.  The combiner should just look like the reducer.
> >
> > Why do you think that having the combiner is causing your
> > problem?
> >
> > On Mon, Aug 18, 2008 at 1:32 PM, Kunsheng Chen
> > <ha...@yahoo.com> wrote:
> >
> > > Hello, everyone.
> > >
> > > I am doing  example  WordCount and have to remove
> > > conf.setCombinerClass(myReduce.class) in main method
> > in order to have maper
> > > and reduce run with different types of output
> > collector.
> > >
> > > I am not sure whether it cause the problem for the
> > perfomance.  Please let
> > > me know if it does.   Any idea is appreciated.
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > ted
>
>
>
>
>


-- 
ted

Re: OK to remove conf.setCombinerClass(myReduce.class)?

Posted by Kunsheng Chen <ke...@yahoo.com>.

Thanks for your reply. I need to set up different types for output inside Map and Reduce, those methods are inside main method: (InDegreeReduce and InDegreeMap implements the interfaces) 

 
       conf.setMapOutputKeyClass(Text.class);
       conf.setMapOutputValueClass(IntWritable.class);

       conf.setOutputKeyClass(Text.class);
       conf.setOutputValueClass(Text.class);

       conf.setMapperClass(InDegreeMap.class);
       
       //  I have to denote the following line otherwise the exception come out when running
      // conf.setCombinerClass(InDegreeReduce.class);  
       
       conf.setReducerClass(InDegreeReduce.class);


I have to denote 'conf.setCombinerClass(...)' otherwise it reminds me errors as below when running (compiled successfully)

------------------------------------------

08/08/18 17:45:52 INFO mapred.JobClient: Task Id : task_200808161218_0077_m_000001_0, Status : FAILED
java.io.IOException: wrong value class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.IntWritable
	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998)
	at org.apache.hadoop.mapred.MapTask$CombineOutputCollector.collect(MapTask.java:1083)
	at DFS_InDegree$InDegreeReduce.reduce(DFS_InDegree.java:77)
	at DFS_InDegree$InDegreeReduce.reduce(DFS_InDegree.java:52)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:876)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:782)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:694)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

--------------------------------------------

I think there are something happen when the combiner try to handle map and reduce but don't know what exactly it is.  thanks for your replay again. 

Please let me know if you have any idea. 

Best,

Keye

--- On Mon, 8/18/08, Ted Dunning <te...@gmail.com> wrote:

> From: Ted Dunning <te...@gmail.com>
> Subject: Re: OK to remove conf.setCombinerClass(myReduce.class)?
> To: core-dev@hadoop.apache.org, hadoopchan@yahoo.com
> Date: Monday, August 18, 2008, 8:57 PM
> This doesn't sound right.
> 
> There are two reasons.
> 
> First, the combiner is critical for good performance on
> word count
> applications.
> 
> Secondly, having a combiner should not prevent having a
> different kind of
> collector.  The combiner should just look like the reducer.
> 
> Why do you think that having the combiner is causing your
> problem?
> 
> On Mon, Aug 18, 2008 at 1:32 PM, Kunsheng Chen
> <ha...@yahoo.com> wrote:
> 
> > Hello, everyone.
> >
> > I am doing  example  WordCount and have to remove
> > conf.setCombinerClass(myReduce.class) in main method
> in order to have maper
> > and reduce run with different types of output
> collector.
> >
> > I am not sure whether it cause the problem for the
> perfomance.  Please let
> > me know if it does.   Any idea is appreciated.
> >
> >
> >
> >
> 
> 
> 
> 
> -- 
> ted

Re: OK to remove conf.setCombinerClass(myReduce.class)?

Posted by Ted Dunning <te...@gmail.com>.

This doesn't sound right.

There are two reasons.

First, the combiner is critical for good performance on word count
applications.

Secondly, having a combiner should not prevent having a different kind of
collector.  The combiner should just look like the reducer.

Why do you think that having the combiner is causing your problem?

On Mon, Aug 18, 2008 at 1:32 PM, Kunsheng Chen <ha...@yahoo.com> wrote:

> Hello, everyone.
>
> I am doing  example  WordCount and have to remove
> conf.setCombinerClass(myReduce.class) in main method in order to have maper
> and reduce run with different types of output collector.
>
> I am not sure whether it cause the problem for the perfomance.  Please let
> me know if it does.   Any idea is appreciated.
>
>
>
>

-- 
ted