You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Gopal Gandhi <go...@yahoo.com> on 2008/07/21 23:35:45 UTC

[Streaming] I figured out a way to do combining using mapper, would anybody check it?

I am using Hadoop Streaming. 
I figured out a way to do combining using mapper, is it the same as using a separate combiner?

For example: the input is a list of words, I want to count their total number for each word. 
The traditional mapper is:

while (<STDIN>) {
  chomp ($_);
  $word = $_;
  print ($word\t1\n);
}
........

Instead of using a additional combiner, I modify the mapper to use a hash

%hash = ();
while (<STDIN>) {
  chomp ($_);
  $word = $_;
  $hash{$word} ++;
}

foreach $key (%hash){
  print "$key\t$hash{$key}\n";
}

Is it the same as using a seperate combiner?