You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Premal Shah <pr...@gmail.com> on 2011/08/04 22:34:48 UTC

Hadoop Streaming Combiner Problem

According to the hadoop streaming
docs<http://hadoop.apache.org/common/docs/r0.20.0/streaming.html#Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29>,
there is an inbuilt Aggregate Java class which can work both as a mapper and
a reducer.

Here is the command:
*shell> hadoop jar hadoop-streaming.jar -file mapper.py -mapper mapper.py
-combiner aggregate -reducer NONE -input input_files -output output_path*

Executing this command fails the mapper with this error:
*java.io.IOException: Cannot run program "aggregate": java.io.IOException:
error=2, No such file or directory*

However, if you run this command using aggregate as the reducer and not the
combiner, the job works fine.
*shell> hadoop jar hadoop-streaming.jar -file mapper.py -mapper mapper.py
-reduce aggregate -input input_files -output output_path*

What am I doing wrong? Is aggregate treated as a command and not a
JavaClassName? If yes, how do I use the JavaClassName instead?

-- 
Regards,
Premal Shah.