You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jim Twensky <ji...@gmail.com> on 2010/01/26 08:27:12 UTC

Question on GroupingComparatorClass

Hi,

I'm using a custom grouping comparator class to simulate a secondary
sort on values, and I set it via Job.setGroupingComparatorClass (using
Hadoop 0.20.x) inside my driver. I'm wondering if this class is also
used when grouping the records in the combiner.

Using a combiner greatly improves the performance in my case, but for
the combiners, I want to use the default comparator, not the custom
one that I use before  the actual reduce.

Is there a way to just set the custom grouping comparator for the
reduce and bypass it during the combine stage?


Thanks,
Jim

Re: Question on GroupingComparatorClass

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
I think combiner gets only the keys sort comparator, not the grouping comparator. So I believe the default grouping is used on combiner, but custom on reducer.
Here's a relevant snipped of code :
{
      super(inputCounter, conf, reporter);
      combinerClass = cls;
      keyClass = (Class<K>) job.getMapOutputKeyClass();
      valueClass = (Class<V>) job.getMapOutputValueClass();
      comparator = (RawComparator<K>) job.getOutputKeyComparator();
    }

Amogh

On 1/26/10 12:57 PM, "Jim Twensky" <ji...@gmail.com> wrote:

Hi,

I'm using a custom grouping comparator class to simulate a secondary
sort on values, and I set it via Job.setGroupingComparatorClass (using
Hadoop 0.20.x) inside my driver. I'm wondering if this class is also
used when grouping the records in the combiner.

Using a combiner greatly improves the performance in my case, but for
the combiners, I want to use the default comparator, not the custom
one that I use before  the actual reduce.

Is there a way to just set the custom grouping comparator for the
reduce and bypass it during the combine stage?


Thanks,
Jim