You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kevin Tse <ke...@gmail.com> on 2010/06/14 04:16:59 UTC

Is it possible to sort values before they are sent to the reduce function?

Hi,
For each key, there might be millions of values(LongWritable), but I only
want to emit top 20 of these values which I want to be sorted in descending
order.
So is it possible to sort these values before they enter the reduce phase?

Thank you in advance!
Kevin

Re: Is it possible to sort values before they are sent to the reduce function?

Posted by Kevin Tse <ke...@gmail.com>.
Hi Alex,
I am was reading Tom's book, but I have not reached chapter 6 yet. I just
read it, it is really helpful.
Thank you for mentioning it, and Thanks also goes to Tom.

Kevin

On Mon, Jun 14, 2010 at 10:22 AM, Alex Kozlov <al...@cloudera.com> wrote:

> Hi Kevin, This is a very common technique.  Look for secondary sort in Tom
> White's HTGD (Chapter 6).  You'll most likely have to write your own
> Partitioner and WritableComparator.  -- Alex K
>
> On Sun, Jun 13, 2010 at 7:16 PM, Kevin Tse <ke...@gmail.com>
> wrote:
>
> > Hi,
> > For each key, there might be millions of values(LongWritable), but I only
> > want to emit top 20 of these values which I want to be sorted in
> descending
> > order.
> > So is it possible to sort these values before they enter the reduce
> phase?
> >
> > Thank you in advance!
> > Kevin
> >
>

Re: Is it possible to sort values before they are sent to the reduce function?

Posted by Alex Kozlov <al...@cloudera.com>.
Hi Kevin, This is a very common technique.  Look for secondary sort in Tom
White's HTGD (Chapter 6).  You'll most likely have to write your own
Partitioner and WritableComparator.  -- Alex K

On Sun, Jun 13, 2010 at 7:16 PM, Kevin Tse <ke...@gmail.com> wrote:

> Hi,
> For each key, there might be millions of values(LongWritable), but I only
> want to emit top 20 of these values which I want to be sorted in descending
> order.
> So is it possible to sort these values before they enter the reduce phase?
>
> Thank you in advance!
> Kevin
>