You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2011/11/01 18:08:32 UTC

setGroupingComparatorClass

Hi, Hadoop experts,

I've written my custom GroupComparator, and I want to tell Hadoop about it.

Now, there is a call

job.setGroupingComparatorClass(),

but I only find it in mapreduce package of version 0.21. In prior versions,
I see a similar call

conf.setOutputValueGroupingComparator(GroupComparator.class);

but it does not cause my GroupComparator to be being used.

So my question is, should I change the code to use the mapreduce package
(not a problem, since Cloudera has it backported to the current
distribution), or is there a different, simpler way?

Thank you. Sincerely,
Mark

Re: setGroupingComparatorClass

Posted by Mark Kerzner <ma...@gmail.com>.
Here is my GroupComparator. With it, I want to use just the part of my
composite key, in order to say that all the keys that match in that part
should go to the same reducer and be presented to the reducer with their
values. So

public class GroupComparator extends WritableComparator {

    public GroupComparator() {
        super(KeyTuple.class, true);
    }

    @Override
    public int compare(WritableComparable K1,
            WritableComparable K2) {
        KeyTuple t1 = (KeyTuple) K1;
        KeyTuple t2 = (KeyTuple) K2;
        return t1.getSku().compareTo(t2.getSku());
    }
}

Then in the reducer I would expect many values, for all keys that I
declared equal in my GroupComparator.

    public void reduce(KeyTuple key, Iterator<Text> values,
            OutputCollector<Text, Text> output, Reporter reporter)
            throws IOException {
        System.out.println("Reducer key=" + key);
        while (values.hasNext()) {
            Text value = values.next();
            System.out.println("Reducer value = " + value);
        }
    }

Instead, I still get individual full keys with one value, and the debugger
does not step into my GroupComparator.

Thanks a bunch!

Mark

On Tue, Nov 1, 2011 at 1:32 PM, Harsh J <ha...@cloudera.com> wrote:

> Hey Mark,
>
> What problem do you see when you use
> JobConf#setOutputValueGroupingComparator(…) when writing jobs with the
> stable API?
>
> I've used it many times and it does get applied.
>
> On Tue, Nov 1, 2011 at 10:38 PM, Mark Kerzner <ma...@gmail.com>
> wrote:
> > Hi, Hadoop experts,
> >
> > I've written my custom GroupComparator, and I want to tell Hadoop about
> it.
> >
> > Now, there is a call
> >
> > job.setGroupingComparatorClass(),
> >
> > but I only find it in mapreduce package of version 0.21. In prior
> versions,
> > I see a similar call
> >
> > conf.setOutputValueGroupingComparator(GroupComparator.class);
> >
> > but it does not cause my GroupComparator to be being used.
> >
> > So my question is, should I change the code to use the mapreduce package
> > (not a problem, since Cloudera has it backported to the current
> > distribution), or is there a different, simpler way?
> >
> > Thank you. Sincerely,
> > Mark
> >
>
>
>
> --
> Harsh J
>

Re: setGroupingComparatorClass

Posted by Harsh J <ha...@cloudera.com>.
Hey Mark,

What problem do you see when you use
JobConf#setOutputValueGroupingComparator(…) when writing jobs with the
stable API?

I've used it many times and it does get applied.

On Tue, Nov 1, 2011 at 10:38 PM, Mark Kerzner <ma...@gmail.com> wrote:
> Hi, Hadoop experts,
>
> I've written my custom GroupComparator, and I want to tell Hadoop about it.
>
> Now, there is a call
>
> job.setGroupingComparatorClass(),
>
> but I only find it in mapreduce package of version 0.21. In prior versions,
> I see a similar call
>
> conf.setOutputValueGroupingComparator(GroupComparator.class);
>
> but it does not cause my GroupComparator to be being used.
>
> So my question is, should I change the code to use the mapreduce package
> (not a problem, since Cloudera has it backported to the current
> distribution), or is there a different, simpler way?
>
> Thank you. Sincerely,
> Mark
>



-- 
Harsh J