You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Mark Vigeant <ma...@riskmetrics.com> on 2009/12/14 20:53:57 UTC

WordCount sort by frequency?

Hello!

I was wondering if there was any convenient way to sort the Reducer output?

Specifically, in WordCount is there a way to sort the results by frequency?

Thank you very much, I'm sorry if this is seen as a dumb question.

Mark Vigeant
RiskMetrics Group, Inc.


This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: WordCount sort by frequency?

Posted by Jason Venner <ja...@gmail.com>.
The simple way is to run another M/R pass over the resulting word count
dataset.
The map can swap the word and the count so that count is the key and word is
the value.
The reduce may be the identity reducer.


On Mon, Dec 14, 2009 at 11:53 AM, Mark Vigeant <mark.vigeant@riskmetrics.com
> wrote:

> Hello!
>
> I was wondering if there was any convenient way to sort the Reducer output?
>
> Specifically, in WordCount is there a way to sort the results by frequency?
>
> Thank you very much, I'm sorry if this is seen as a dumb question.
>
> Mark Vigeant
> RiskMetrics Group, Inc.
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals