You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Vitaliy Semochkin <vi...@gmail.com> on 2010/07/25 18:17:15 UTC

what output types should Combiner use?

Hi,

Am I right that combiners are supposed to return key/value types that
reducers expect as an input?

Lets say I have map reduce operation to calculate number of different ip
that visited a resource

i have log
name1 ip1
name2 ip2
name1 ip2
...

map produces pairs - (Text resourceName,Text ip)

reducer produces pairs (Text resourceName, IntWritable numberOfVisists)

Am I right that combiner should return (Text resourceName, Text ip) pairs

if so what can be optimized in combiner beside removing repeated ips (if
removing will give any benefit at all).

Thanks in advance,
Vitaliy S