You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "P.ILAYARAJA" <il...@rediff.co.in> on 2008/08/29 08:55:18 UTC

Problem in Map/Reduce

Hello:

I wrote a simple Map/Reduce program. The ouput key of the Map function is a user defined datum(class)
with two member strings. The OutputKeyComparatorClass is set to this datum class and the class 
implements the "compareTo" function. 

The problem is the final MapOutput from reduce has the same "key" occuring in more than one record.
Any thougts on why this could happen?

Also I see that the compareTo function never gets as input the pair of "keys" that are same for comparison.....

Regards,
Ilay

Re: Problem in Map/Reduce

Posted by Owen O'Malley <om...@apache.org>.

Note that performance will be very slow in the sort if you don't also define
a RawComparator that compares the serialized forms of the keys. Look at
IntWritable for how to do it.

You need to define a reasonable hashCode because the default partitioner
uses it to decide which reduce to send it to. If you can define your own
partitioner, you could have all of the keys with the same first string go to
the same reduce for instance.

And yes, the function you need to define, assuming you don't have a
RawComparator, is compareTo, not equals.