You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by zsongbo <zs...@gmail.com> on 2009/05/07 12:02:07 UTC

setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Hi all,
I have a application want the rules of sorting and grouping use
different Comparator.

I had tested 0.19.1 and 0.20.0 about this function, but both do not work for
Combiner.

In 0.19.1, I use job.setOutputValueGroupingComparator(), and
in 0.20.0, I use job.setGroupingComparatorClass()

This function is ok for reduce phase, the reduce phase can group the keys by
above Comparator, and sort by default comparator of the key class.

But I want the combiner can use a separator comparator for group, different
from sorting, is it possible?

Schubert

Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Posted by zsongbo <zs...@gmail.com>.
Thanks Min Zhou,
I had just have a glance of 0.20 last week. It seems a big code
reorganization.
I had read the SecondarySort class too, it seems same as 0.19.

Partitioner can only hash/partition the map output key-value to different
reducer. In my code I have a partitioner which partition the output by hash
of userid.

My question is wanting to get the same feature of separate-group in
Combiner.

On Mon, May 11, 2009 at 8:37 PM, Min Zhou <co...@gmail.com> wrote:

> Hey Schubert,
>
> You need at least two new classes, a Partitioner and a Comparator for
> different grouping and sorting.
> There is an example in hadoop's source code can deal with this sort of
> problems. Download the least release of hadoop(version 0.20.0)
> and check out src/examples/SecondarySort.java.
> BTW, KeyFieldBasedPartitioner and KeyFieldBasedComparator can also be
> trouble-shooters for you, however, they have somewhat bugs.
>
>
> On Mon, May 11, 2009 at 7:42 PM, zsongbo <zs...@gmail.com> wrote:
>
> > Thanks Jothi,
> > For example, I have a dataset with map key="city+userid+time". The output
> > of
> > mapper are sorted by this map key.
> >
> > Than, I group the reduce output according to "city+userid" by define
> > my OutputValueGroupingComparator
> > which just compare "city+userid" in the mapkey. I still want the output
> are
> > sorted by time in each group.
> >
> > It works fine.
> >
> > But to improve the performance, I want to use combiner which should also
> > group as "city+userid", but sorted by "city+userid+time".
> >
> > I do not know if this requirement is reasonable.
> >
> >
> > Schubert
> >
> > On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <jothipn@yahoo-inc.com
> > >wrote:
> >
> > > OutputValueGroupingComparator is used only at the reducer. AFAIK, I do
> > not
> > > think you can have a different comparator for combiners.
> > >
> > > Jothi
> > >
> > >
> > > On 5/7/09 3:32 PM, "zsongbo" <zs...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > > I have a application want the rules of sorting and grouping use
> > > > different Comparator.
> > > >
> > > > I had tested 0.19.1 and 0.20.0 about this function, but both do not
> > work
> > > for
> > > > Combiner.
> > > >
> > > > In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> > > > in 0.20.0, I use job.setGroupingComparatorClass()
> > > >
> > > > This function is ok for reduce phase, the reduce phase can group the
> > keys
> > > by
> > > > above Comparator, and sort by default comparator of the key class.
> > > >
> > > > But I want the combiner can use a separator comparator for group,
> > > different
> > > > from sorting, is it possible?
> > > >
> > > > Schubert
> > >
> > >
> >
>
>
> Min
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Posted by Min Zhou <co...@gmail.com>.
Hey Schubert,

You need at least two new classes, a Partitioner and a Comparator for
different grouping and sorting.
There is an example in hadoop's source code can deal with this sort of
problems. Download the least release of hadoop(version 0.20.0)
and check out src/examples/SecondarySort.java.
BTW, KeyFieldBasedPartitioner and KeyFieldBasedComparator can also be
trouble-shooters for you, however, they have somewhat bugs.


On Mon, May 11, 2009 at 7:42 PM, zsongbo <zs...@gmail.com> wrote:

> Thanks Jothi,
> For example, I have a dataset with map key="city+userid+time". The output
> of
> mapper are sorted by this map key.
>
> Than, I group the reduce output according to "city+userid" by define
> my OutputValueGroupingComparator
> which just compare "city+userid" in the mapkey. I still want the output are
> sorted by time in each group.
>
> It works fine.
>
> But to improve the performance, I want to use combiner which should also
> group as "city+userid", but sorted by "city+userid+time".
>
> I do not know if this requirement is reasonable.
>
>
> Schubert
>
> On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <jothipn@yahoo-inc.com
> >wrote:
>
> > OutputValueGroupingComparator is used only at the reducer. AFAIK, I do
> not
> > think you can have a different comparator for combiners.
> >
> > Jothi
> >
> >
> > On 5/7/09 3:32 PM, "zsongbo" <zs...@gmail.com> wrote:
> >
> > > Hi all,
> > > I have a application want the rules of sorting and grouping use
> > > different Comparator.
> > >
> > > I had tested 0.19.1 and 0.20.0 about this function, but both do not
> work
> > for
> > > Combiner.
> > >
> > > In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> > > in 0.20.0, I use job.setGroupingComparatorClass()
> > >
> > > This function is ok for reduce phase, the reduce phase can group the
> keys
> > by
> > > above Comparator, and sort by default comparator of the key class.
> > >
> > > But I want the combiner can use a separator comparator for group,
> > different
> > > from sorting, is it possible?
> > >
> > > Schubert
> >
> >
>


Min
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Posted by Min Zhou <co...@gmail.com>.
oops, misunderstanded your problem.  Before you do combine operations on map
output keys, it's that they are actually sorted by a quicksort sorter in
default according the rule you set at jobConf.setOutputKeyComparator(). It's
impossible archieving your target w/o modify some source code of hadoop,
rebuilding it. Even though, that's make no sense.


On Mon, May 11, 2009 at 7:42 PM, zsongbo <zs...@gmail.com> wrote:

> Thanks Jothi,
> For example, I have a dataset with map key="city+userid+time". The output
> of
> mapper are sorted by this map key.
>
> Than, I group the reduce output according to "city+userid" by define
> my OutputValueGroupingComparator
> which just compare "city+userid" in the mapkey. I still want the output are
> sorted by time in each group.
>
> It works fine.
>
> But to improve the performance, I want to use combiner which should also
> group as "city+userid", but sorted by "city+userid+time".
>
> I do not know if this requirement is reasonable.
>
>
> Schubert
>
> On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <jothipn@yahoo-inc.com
> >wrote:
>
> > OutputValueGroupingComparator is used only at the reducer. AFAIK, I do
> not
> > think you can have a different comparator for combiners.
> >
> > Jothi
> >
> >
> > On 5/7/09 3:32 PM, "zsongbo" <zs...@gmail.com> wrote:
> >
> > > Hi all,
> > > I have a application want the rules of sorting and grouping use
> > > different Comparator.
> > >
> > > I had tested 0.19.1 and 0.20.0 about this function, but both do not
> work
> > for
> > > Combiner.
> > >
> > > In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> > > in 0.20.0, I use job.setGroupingComparatorClass()
> > >
> > > This function is ok for reduce phase, the reduce phase can group the
> keys
> > by
> > > above Comparator, and sort by default comparator of the key class.
> > >
> > > But I want the combiner can use a separator comparator for group,
> > different
> > > from sorting, is it possible?
> > >
> > > Schubert
> >
> >
>


Min
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
Hi Schubert,

Currently, combiners use the OutputKeyComparator and not
OutputValueGroupingComparator.

If you think this functionality is very useful to you, you could raise a
Jira for the same and discuss it there.

Needless to say, you could always contribute your patch for the Jira too :)

Jothi


On 5/11/09 5:12 PM, "zsongbo" <zs...@gmail.com> wrote:

> Thanks Jothi,
> For example, I have a dataset with map key="city+userid+time". The output of
> mapper are sorted by this map key.
> 
> Than, I group the reduce output according to "city+userid" by define
> my OutputValueGroupingComparator
> which just compare "city+userid" in the mapkey. I still want the output are
> sorted by time in each group.
> 
> It works fine.
> 
> But to improve the performance, I want to use combiner which should also
> group as "city+userid", but sorted by "city+userid+time".
> 
> I do not know if this requirement is reasonable.
> 
> 
> Schubert
> 
> On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan
> <jo...@yahoo-inc.com>wrote:
> 
>> OutputValueGroupingComparator is used only at the reducer. AFAIK, I do not
>> think you can have a different comparator for combiners.
>> 
>> Jothi
>> 
>> 
>> On 5/7/09 3:32 PM, "zsongbo" <zs...@gmail.com> wrote:
>> 
>>> Hi all,
>>> I have a application want the rules of sorting and grouping use
>>> different Comparator.
>>> 
>>> I had tested 0.19.1 and 0.20.0 about this function, but both do not work
>> for
>>> Combiner.
>>> 
>>> In 0.19.1, I use job.setOutputValueGroupingComparator(), and
>>> in 0.20.0, I use job.setGroupingComparatorClass()
>>> 
>>> This function is ok for reduce phase, the reduce phase can group the keys
>> by
>>> above Comparator, and sort by default comparator of the key class.
>>> 
>>> But I want the combiner can use a separator comparator for group,
>> different
>>> from sorting, is it possible?
>>> 
>>> Schubert
>> 
>> 


Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Posted by zsongbo <zs...@gmail.com>.
Thanks Jothi,
For example, I have a dataset with map key="city+userid+time". The output of
mapper are sorted by this map key.

Than, I group the reduce output according to "city+userid" by define
my OutputValueGroupingComparator
which just compare "city+userid" in the mapkey. I still want the output are
sorted by time in each group.

It works fine.

But to improve the performance, I want to use combiner which should also
group as "city+userid", but sorted by "city+userid+time".

I do not know if this requirement is reasonable.


Schubert

On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <jo...@yahoo-inc.com>wrote:

> OutputValueGroupingComparator is used only at the reducer. AFAIK, I do not
> think you can have a different comparator for combiners.
>
> Jothi
>
>
> On 5/7/09 3:32 PM, "zsongbo" <zs...@gmail.com> wrote:
>
> > Hi all,
> > I have a application want the rules of sorting and grouping use
> > different Comparator.
> >
> > I had tested 0.19.1 and 0.20.0 about this function, but both do not work
> for
> > Combiner.
> >
> > In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> > in 0.20.0, I use job.setGroupingComparatorClass()
> >
> > This function is ok for reduce phase, the reduce phase can group the keys
> by
> > above Comparator, and sort by default comparator of the key class.
> >
> > But I want the combiner can use a separator comparator for group,
> different
> > from sorting, is it possible?
> >
> > Schubert
>
>

Re: setGroupingComparatorClass() or setOutputValueGroupingComparator() does not work for Combiner

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
OutputValueGroupingComparator is used only at the reducer. AFAIK, I do not
think you can have a different comparator for combiners.

Jothi


On 5/7/09 3:32 PM, "zsongbo" <zs...@gmail.com> wrote:

> Hi all,
> I have a application want the rules of sorting and grouping use
> different Comparator.
> 
> I had tested 0.19.1 and 0.20.0 about this function, but both do not work for
> Combiner.
> 
> In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> in 0.20.0, I use job.setGroupingComparatorClass()
> 
> This function is ok for reduce phase, the reduce phase can group the keys by
> above Comparator, and sort by default comparator of the key class.
> 
> But I want the combiner can use a separator comparator for group, different
> from sorting, is it possible?
> 
> Schubert