You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by tim robertson <ti...@gmail.com> on 2008/03/09 12:54:32 UTC

Newbie reducer question

Hi all,
I am a day one newbie investigating distributed work for the first time...

I have run through the tutorials with ease (thanks for the nice
documentation) and now have written my first map reduce.

Is it accurate to say that the reduce is repetitively called by the Hadoop
framework until the number of inputs = number of outputs?

I am only running in single server mode at the moment but I have map
outputs:

Football UK
Football UK
Rugby UK
American Football USA
Rugby FR
Football FR

And reduce outputs:

Football UK, FR
Rugby UK, FR
American Football USA

This worked fine.

But when I tried to include the counts in the output, I got some strange
results:

Football UK(2), FR(1)(1)
Rugby UK(1), FR(1)(1)
American Football USA(1)(1)

I think it was because I was just doing String manipulation in the reducer
to produce the counts.

I presume then I need to not use the Text type and actually define a Type
for the Country+Count?

Thanks,

Tim