You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Bryan Yeung <br...@gmail.com> on 2012/04/17 03:44:22 UTC

map and reduce with different value classes

Hello Everyone,

I'm relatively new to hadoop mapreduce and I'm trying to get this
simple modification to the WordCount example to work.

I'm using hadoop-1.0.2, and I've included both a convenient diff and
also attached my new WordCount.java file.

The thing I am trying to achieve is to have the value class that is
output by the map phase be different than the value class output by
the reduce phase.

Any help would be greatly appreciated!

Thanks,

Bryan

diff --git a/WordCount.java.orig b/WordCount.java
index 81a6c21..6a768f7 100644
--- a/WordCount.java.orig
+++ b/WordCount.java
@@ -33,8 +33,8 @@ public class WordCount {
   }

   public static class IntSumReducer
-       extends Reducer<Text,IntWritable,Text,IntWritable> {
-    private IntWritable result = new IntWritable();
+       extends Reducer<Text,IntWritable,Text,Text> {
+    private Text result = new Text();

     public void reduce(Text key, Iterable<IntWritable> values,
                        Context context
@@ -43,7 +43,7 @@ public class WordCount {
       for (IntWritable val : values) {
         sum += val.get();
       }
-      result.set(sum);
+      result.set("" + sum);
       context.write(key, result);
     }
   }
@@ -58,10 +58,11 @@ public class WordCount {
     Job job = new Job(conf, "word count");
     job.setJarByClass(WordCount.class);
     job.setMapperClass(TokenizerMapper.class);
+       job.setMapOutputValueClass(IntWritable.class);
     job.setCombinerClass(IntSumReducer.class);
     job.setReducerClass(IntSumReducer.class);
     job.setOutputKeyClass(Text.class);
-    job.setOutputValueClass(IntWritable.class);
+    job.setOutputValueClass(Text.class);
     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
     System.exit(job.waitForCompletion(true) ? 0 : 1);

Re: map and reduce with different value classes

Posted by Bryan Yeung <br...@gmail.com>.

Oh no!  I just figured it out :-/

It's actually dying on the Combine step, with an error about the
Reduce class (because the WordCount example uses the Reduce class as
both the reducer and combiner).

This makes sense now.

Sorry for the silly question, and thanks for the help!

Bryan

On Mon, Apr 16, 2012 at 11:36 PM, Bejoy Ks <be...@gmail.com> wrote:
> Hi Bryan
>
>     Can you post in the error stack trace?
>
> Regards
> Bejoy KS
>
> On Tue, Apr 17, 2012 at 8:41 AM, Bryan Yeung <br...@gmail.com> wrote:
>> Hello Bejoy,
>>
>> Thanks for your reply.
>>
>> Isn't that exactly what I've done with my modifications to
>> WordCount.java?  Could you have a look at the diff I supplied and/or
>> the WordCount.java file I attached and tell me how I've deviated from
>> what you say below?
>>
>> Thanks,
>>
>> Bryan
>>
>> On Mon, Apr 16, 2012 at 11:03 PM, Bejoy Ks <be...@gmail.com> wrote:
>>> HI Bryan
>>>      You can set different key and value types with the following steps
>>> - ensure that the map output key value type is the reducer input key value type
>>> - specify it on your Driver Class as
>>>
>>> //set map output key value types
>>> job.setMapOutputKeyClass(theClass)
>>>   job.setMapOutputValueClass(theClass)
>>>
>>> //set final/reduce output key value types
>>>         job.setOutputKeyClass(Text.class);
>>>         job.setOutputValueClass(IntWritable.class)
>>>
>>> If both map output and reduce output key value types are the same you
>>> just need to specify the final output types.
>>>
>>> Regards
>>> Bejoy KS
>>>
>>>
>>>
>>>
>>> On Tue, Apr 17, 2012 at 7:14 AM, Bryan Yeung <br...@gmail.com> wrote:
>>>>
>>>> Hello Everyone,
>>>>
>>>> I'm relatively new to hadoop mapreduce and I'm trying to get this
>>>> simple modification to the WordCount example to work.
>>>>
>>>> I'm using hadoop-1.0.2, and I've included both a convenient diff and
>>>> also attached my new WordCount.java file.
>>>>
>>>> The thing I am trying to achieve is to have the value class that is
>>>> output by the map phase be different than the value class output by
>>>> the reduce phase.
>>>>
>>>> Any help would be greatly appreciated!
>>>>
>>>> Thanks,
>>>>
>>>> Bryan
>>>>
>>>> diff --git a/WordCount.java.orig b/WordCount.java
>>>> index 81a6c21..6a768f7 100644
>>>> --- a/WordCount.java.orig
>>>> +++ b/WordCount.java
>>>> @@ -33,8 +33,8 @@ public class WordCount {
>>>>   }
>>>>
>>>>   public static class IntSumReducer
>>>> -       extends Reducer<Text,IntWritable,Text,IntWritable> {
>>>> -    private IntWritable result = new IntWritable();
>>>> +       extends Reducer<Text,IntWritable,Text,Text> {
>>>> +    private Text result = new Text();
>>>>
>>>>     public void reduce(Text key, Iterable<IntWritable> values,
>>>>                        Context context
>>>> @@ -43,7 +43,7 @@ public class WordCount {
>>>>       for (IntWritable val : values) {
>>>>         sum += val.get();
>>>>       }
>>>> -      result.set(sum);
>>>> +      result.set("" + sum);
>>>>       context.write(key, result);
>>>>     }
>>>>   }
>>>> @@ -58,10 +58,11 @@ public class WordCount {
>>>>     Job job = new Job(conf, "word count");
>>>>     job.setJarByClass(WordCount.class);
>>>>     job.setMapperClass(TokenizerMapper.class);
>>>> +       job.setMapOutputValueClass(IntWritable.class);
>>>>     job.setCombinerClass(IntSumReducer.class);
>>>>     job.setReducerClass(IntSumReducer.class);
>>>>     job.setOutputKeyClass(Text.class);
>>>> -    job.setOutputValueClass(IntWritable.class);
>>>> +    job.setOutputValueClass(Text.class);
>>>>     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>>>     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
>>>>     System.exit(job.waitForCompletion(true) ? 0 : 1);

Re: map and reduce with different value classes

Posted by Bejoy Ks <be...@gmail.com>.

Hi Bryan

     Can you post in the error stack trace?

Regards
Bejoy KS

On Tue, Apr 17, 2012 at 8:41 AM, Bryan Yeung <br...@gmail.com> wrote:
> Hello Bejoy,
>
> Thanks for your reply.
>
> Isn't that exactly what I've done with my modifications to
> WordCount.java?  Could you have a look at the diff I supplied and/or
> the WordCount.java file I attached and tell me how I've deviated from
> what you say below?
>
> Thanks,
>
> Bryan
>
> On Mon, Apr 16, 2012 at 11:03 PM, Bejoy Ks <be...@gmail.com> wrote:
>> HI Bryan
>>      You can set different key and value types with the following steps
>> - ensure that the map output key value type is the reducer input key value type
>> - specify it on your Driver Class as
>>
>> //set map output key value types
>> job.setMapOutputKeyClass(theClass)
>>   job.setMapOutputValueClass(theClass)
>>
>> //set final/reduce output key value types
>>         job.setOutputKeyClass(Text.class);
>>         job.setOutputValueClass(IntWritable.class)
>>
>> If both map output and reduce output key value types are the same you
>> just need to specify the final output types.
>>
>> Regards
>> Bejoy KS
>>
>>
>>
>>
>> On Tue, Apr 17, 2012 at 7:14 AM, Bryan Yeung <br...@gmail.com> wrote:
>>>
>>> Hello Everyone,
>>>
>>> I'm relatively new to hadoop mapreduce and I'm trying to get this
>>> simple modification to the WordCount example to work.
>>>
>>> I'm using hadoop-1.0.2, and I've included both a convenient diff and
>>> also attached my new WordCount.java file.
>>>
>>> The thing I am trying to achieve is to have the value class that is
>>> output by the map phase be different than the value class output by
>>> the reduce phase.
>>>
>>> Any help would be greatly appreciated!
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>> diff --git a/WordCount.java.orig b/WordCount.java
>>> index 81a6c21..6a768f7 100644
>>> --- a/WordCount.java.orig
>>> +++ b/WordCount.java
>>> @@ -33,8 +33,8 @@ public class WordCount {
>>>   }
>>>
>>>   public static class IntSumReducer
>>> -       extends Reducer<Text,IntWritable,Text,IntWritable> {
>>> -    private IntWritable result = new IntWritable();
>>> +       extends Reducer<Text,IntWritable,Text,Text> {
>>> +    private Text result = new Text();
>>>
>>>     public void reduce(Text key, Iterable<IntWritable> values,
>>>                        Context context
>>> @@ -43,7 +43,7 @@ public class WordCount {
>>>       for (IntWritable val : values) {
>>>         sum += val.get();
>>>       }
>>> -      result.set(sum);
>>> +      result.set("" + sum);
>>>       context.write(key, result);
>>>     }
>>>   }
>>> @@ -58,10 +58,11 @@ public class WordCount {
>>>     Job job = new Job(conf, "word count");
>>>     job.setJarByClass(WordCount.class);
>>>     job.setMapperClass(TokenizerMapper.class);
>>> +       job.setMapOutputValueClass(IntWritable.class);
>>>     job.setCombinerClass(IntSumReducer.class);
>>>     job.setReducerClass(IntSumReducer.class);
>>>     job.setOutputKeyClass(Text.class);
>>> -    job.setOutputValueClass(IntWritable.class);
>>> +    job.setOutputValueClass(Text.class);
>>>     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>>     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
>>>     System.exit(job.waitForCompletion(true) ? 0 : 1);

Re: map and reduce with different value classes

Posted by Bryan Yeung <br...@gmail.com>.

Hello Bejoy,

Thanks for your reply.

Isn't that exactly what I've done with my modifications to
WordCount.java?  Could you have a look at the diff I supplied and/or
the WordCount.java file I attached and tell me how I've deviated from
what you say below?

Thanks,

Bryan

On Mon, Apr 16, 2012 at 11:03 PM, Bejoy Ks <be...@gmail.com> wrote:
> HI Bryan
>      You can set different key and value types with the following steps
> - ensure that the map output key value type is the reducer input key value type
> - specify it on your Driver Class as
>
> //set map output key value types
> job.setMapOutputKeyClass(theClass)
>   job.setMapOutputValueClass(theClass)
>
> //set final/reduce output key value types
>         job.setOutputKeyClass(Text.class);
>         job.setOutputValueClass(IntWritable.class)
>
> If both map output and reduce output key value types are the same you
> just need to specify the final output types.
>
> Regards
> Bejoy KS
>
>
>
>
> On Tue, Apr 17, 2012 at 7:14 AM, Bryan Yeung <br...@gmail.com> wrote:
>>
>> Hello Everyone,
>>
>> I'm relatively new to hadoop mapreduce and I'm trying to get this
>> simple modification to the WordCount example to work.
>>
>> I'm using hadoop-1.0.2, and I've included both a convenient diff and
>> also attached my new WordCount.java file.
>>
>> The thing I am trying to achieve is to have the value class that is
>> output by the map phase be different than the value class output by
>> the reduce phase.
>>
>> Any help would be greatly appreciated!
>>
>> Thanks,
>>
>> Bryan
>>
>> diff --git a/WordCount.java.orig b/WordCount.java
>> index 81a6c21..6a768f7 100644
>> --- a/WordCount.java.orig
>> +++ b/WordCount.java
>> @@ -33,8 +33,8 @@ public class WordCount {
>>   }
>>
>>   public static class IntSumReducer
>> -       extends Reducer<Text,IntWritable,Text,IntWritable> {
>> -    private IntWritable result = new IntWritable();
>> +       extends Reducer<Text,IntWritable,Text,Text> {
>> +    private Text result = new Text();
>>
>>     public void reduce(Text key, Iterable<IntWritable> values,
>>                        Context context
>> @@ -43,7 +43,7 @@ public class WordCount {
>>       for (IntWritable val : values) {
>>         sum += val.get();
>>       }
>> -      result.set(sum);
>> +      result.set("" + sum);
>>       context.write(key, result);
>>     }
>>   }
>> @@ -58,10 +58,11 @@ public class WordCount {
>>     Job job = new Job(conf, "word count");
>>     job.setJarByClass(WordCount.class);
>>     job.setMapperClass(TokenizerMapper.class);
>> +       job.setMapOutputValueClass(IntWritable.class);
>>     job.setCombinerClass(IntSumReducer.class);
>>     job.setReducerClass(IntSumReducer.class);
>>     job.setOutputKeyClass(Text.class);
>> -    job.setOutputValueClass(IntWritable.class);
>> +    job.setOutputValueClass(Text.class);
>>     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
>>     System.exit(job.waitForCompletion(true) ? 0 : 1);

Re: map and reduce with different value classes

Posted by Bejoy Ks <be...@gmail.com>.

HI Bryan
     You can set different key and value types with the following steps
- ensure that the map output key value type is the reducer input key value type
- specify it on your Driver Class as

//set map output key value types
job.setMapOutputKeyClass(theClass)
  job.setMapOutputValueClass(theClass)

//set final/reduce output key value types
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class)

If both map output and reduce output key value types are the same you
just need to specify the final output types.

Regards
Bejoy KS




On Tue, Apr 17, 2012 at 7:14 AM, Bryan Yeung <br...@gmail.com> wrote:
>
> Hello Everyone,
>
> I'm relatively new to hadoop mapreduce and I'm trying to get this
> simple modification to the WordCount example to work.
>
> I'm using hadoop-1.0.2, and I've included both a convenient diff and
> also attached my new WordCount.java file.
>
> The thing I am trying to achieve is to have the value class that is
> output by the map phase be different than the value class output by
> the reduce phase.
>
> Any help would be greatly appreciated!
>
> Thanks,
>
> Bryan
>
> diff --git a/WordCount.java.orig b/WordCount.java
> index 81a6c21..6a768f7 100644
> --- a/WordCount.java.orig
> +++ b/WordCount.java
> @@ -33,8 +33,8 @@ public class WordCount {
>   }
>
>   public static class IntSumReducer
> -       extends Reducer<Text,IntWritable,Text,IntWritable> {
> -    private IntWritable result = new IntWritable();
> +       extends Reducer<Text,IntWritable,Text,Text> {
> +    private Text result = new Text();
>
>     public void reduce(Text key, Iterable<IntWritable> values,
>                        Context context
> @@ -43,7 +43,7 @@ public class WordCount {
>       for (IntWritable val : values) {
>         sum += val.get();
>       }
> -      result.set(sum);
> +      result.set("" + sum);
>       context.write(key, result);
>     }
>   }
> @@ -58,10 +58,11 @@ public class WordCount {
>     Job job = new Job(conf, "word count");
>     job.setJarByClass(WordCount.class);
>     job.setMapperClass(TokenizerMapper.class);
> +       job.setMapOutputValueClass(IntWritable.class);
>     job.setCombinerClass(IntSumReducer.class);
>     job.setReducerClass(IntSumReducer.class);
>     job.setOutputKeyClass(Text.class);
> -    job.setOutputValueClass(IntWritable.class);
> +    job.setOutputValueClass(Text.class);
>     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
>     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
>     System.exit(job.waitForCompletion(true) ? 0 : 1);