You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Rob Stewart <ro...@googlemail.com> on 2010/01/15 23:42:54 UTC

Quick Clarification of sort mechanism

Hi,

I am having a look at the WordCount java example here:
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Walk-through

I am wanting a word count application that, instead of sorting by key
(alphabetically by word), I want to sort by the count (frequency) of the
words.

I can't see in the reduce method in the above example where exactly the
key/values get specified to order by key alphabetically? Or how I can
override this to state to for by the value of the final reduce (i.e. by the
frequency).

Thanks,

Rob Stewart

Re: Quick Clarification of sort mechanism

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Rob,

The sort is an internal mechanism in hadoop, the reduce step will always do
sort on the keys.
If you want to sort the result by count, you could start a second job with
the input from the first job, and use the count as the key, word as the
value,.

On Fri, Jan 15, 2010 at 2:42 PM, Rob Stewart <ro...@googlemail.com>wrote:

> Hi,
>
> I am having a look at the WordCount java example here:
>
> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Walk-through
>
> I am wanting a word count application that, instead of sorting by key
> (alphabetically by word), I want to sort by the count (frequency) of the
> words.
>
> I can't see in the reduce method in the above example where exactly the
> key/values get specified to order by key alphabetically? Or how I can
> override this to state to for by the value of the final reduce (i.e. by the
> frequency).
>
> Thanks,
>
> Rob Stewart
>

-- 
Best Regards

Jeff Zhang