You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by HU Wenjing A <We...@alcatel-sbell.com.cn> on 2012/05/28 08:50:27 UTC

hadoop streaming using java as mapper & reducer

Hi all,

I am a new learner of hadoop, and recently I want to use hadoop streaming to run java program as mapper and reducer.
(because I want to use hadoop streaming to transplant some existing java programs to process xml file).
  To have a try, first I use the hadoop wordcount example (as follows):

     Countm.java:
     import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class countm extends Mapper<LongWritable, Text, Text, IntWritable> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();

         public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
                }
             }
 }


    Countm.java:
 import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

 public class countr extends Reducer<Text, IntWritable, Text, IntWritable> {
         public void reduce(Text key, Iterable<IntWritable> values, Context context)
           throws IOException, InterruptedException {
               int sum = 0;
               for (IntWritable val : values) {
                  sum += val.get();
               }
             context.write(key, new IntWritable(sum));
           }
        }

And I execute the following command :
bin/hadoop jar mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -files test/countr.class -files test/countm.class -mapper test/countm.class  -reducer test/countr.class  -input input -output output
then I got the following information:
12/05/27 16:48:29 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/05/27 16:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/05/27 16:48:30 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
packageJobJar: [/root/hadoop-0.21.0/tmp/hadoop-unjar7827631350430711602/] [] /tmp/streamjob2931278447922230194.jar tmpDir=null
12/05/27 16:48:31 INFO mapred.FileInputFormat: Total input paths to process : 1
12/05/27 16:48:31 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: number of splits:19
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
12/05/27 16:48:31 INFO streaming.StreamJob: getLocalDirs(): [/root/hadoop-0.21.0/tmp/mapred/local]
12/05/27 16:48:31 INFO streaming.StreamJob: Running job: job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:48:31 INFO streaming.StreamJob: /root/hadoop-0.21.0/bin/../bin/hadoop job  -Dmapreduce.jobtracker.address=192.168.204.130:9001 -kill job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:48:32 INFO streaming.StreamJob:  map 0%  reduce 0%
12/05/27 16:49:59 INFO streaming.StreamJob:  map 100%  reduce 100%
12/05/27 16:49:59 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:49:59 INFO streaming.StreamJob: /root/hadoop-0.21.0/bin/../bin/hadoop job  -Dmapreduce.jobtracker.address=192.168.204.130:9001 -kill job_201205242131_0013
12/05/27 16:49:59 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:49:59 ERROR streaming.StreamJob: Job not Successful!
12/05/27 16:49:59 INFO streaming.StreamJob: killJob...
Streaming Command Failed!


I am not sure whether I use it in the wrong way or there is something special need to be done for using java in hadoop streaming?
It would be great if you can help me with it as I cannot move any further without overcoming this.
Thanking you in anticipation.  : )

Thanks & best regards,
wenjing

Re: hadoop streaming using java as mapper & reducer

Posted by Keith Wiley <kw...@keithwiley.com>.
I'm confused.  First, most people don't use streaming in conjunction with Java since Hadoop supports Java directly...although I think you were saying in your parenthetical comment at the top of your post that you may have a legitimate reason for doing that.  Then, in your code, you appear to be creating a "classic" MapReduce program, not a streaming one.  I say this because your Mapper is a typical mapper, extending the Mapper from the Hadoop API.  I would expect a streaming Mapper to be a top-level class, not a derivation from the API.  Furthermore, it would gather its inputs not through the overridden map() method but rather from stdin.  Likewise, it would send output to stdout.

Am I completely misunderstanding your situation?

On May 27, 2012, at 23:50 , HU Wenjing A wrote:

> Hi all,
> 
> I am a new learner of hadoop, and recently I want to use hadoop streaming to run java program as mapper and reducer.
> (because I want to use hadoop streaming to transplant some existing java programs to process xml file).
>  To have a try, first I use the hadoop wordcount example (as follows):
> 
>     Countm.java:
>     import java.io.IOException;
> import java.util.StringTokenizer;
> import org.apache.hadoop.io.IntWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Mapper;
> 
> public class countm extends Mapper<LongWritable, Text, Text, IntWritable> {
>         private final static IntWritable one = new IntWritable(1);
>         private Text word = new Text();
> 
>         public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
>           String line = value.toString();
>           StringTokenizer tokenizer = new StringTokenizer(line);
>           while (tokenizer.hasMoreTokens()) {
>                word.set(tokenizer.nextToken());
>                context.write(word, one);
>                }
>             }
> }


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
                                           --  Homer Simpson
________________________________________________________________________________