You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Chandrashekhar Kotekar <sh...@gmail.com> on 2013/04/24 07:53:22 UTC

Fwd: Multiple ways to write Hadoop program driver - Which one to choose?

Hi,


I have observed that there are multiple ways to write driver method of
Hadoop program.

Following method is given in Hadoop Tutorial by
Yahoo<http://developer.yahoo.com/hadoop/tutorial/module4.html>

 public void run(String inputPath, String outputPath) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(MapClass.class);
    conf.setReducerClass(Reduce.class);

    FileInputFormat.addInputPath(conf, new Path(inputPath));
    FileOutputFormat.setOutputPath(conf, new Path(outputPath));

    JobClient.runJob(conf);
  }

and this method is given in Hadoop The Definitive Guide 2012 book by
Oreilly.

public static void main(String[] args) throws Exception {
  if (args.length != 2) {
    System.err.println("Usage: MaxTemperature <input path> <output path>");
    System.exit(-1);
  }
  Job job = new Job();
  job.setJarByClass(MaxTemperature.class);
  job.setJobName("Max temperature");
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(MaxTemperatureMapper.class);
  job.setReducerClass(MaxTemperatureReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
}

While trying program given in Oreilly book I found that constructors
of Job class
are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised
to see that they have used deprecated class.

I would like to know which method everyone uses?







Regards,
Chandrash3khar K0tekar
Mobile - 8884631122

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi sheikhar

   The deprecated job constructor is actually deprecated according to the job source code . 
   There is another constructor witch is not deprecated ,you can find in the hint raised by eclipse .

  




发自我的 iPhone

在 2013-4-24，13:53，Chandrashekhar Kotekar <sh...@gmail.com> 写道：

> Hi,
> 
> 
> I have observed that there are multiple ways to write driver method of Hadoop program.
> 
> Following method is given in Hadoop Tutorial by Yahoo
> 
> 
>  public void run(String inputPath, String outputPath) throws Exception {
>     JobConf conf = new JobConf(WordCount.class);
>     conf.setJobName("wordcount");
> 
>     // the keys are words (strings)
>     conf.setOutputKeyClass(Text.class);
>     // the values are counts (ints)
>     conf.setOutputValueClass(IntWritable.class);
> 
>     conf.setMapperClass(MapClass.class);
>     conf.setReducerClass(Reduce.class);
> 
>     FileInputFormat.addInputPath(conf, new Path(inputPath));
>     FileOutputFormat.setOutputPath(conf, new Path(outputPath));
> 
>     JobClient.runJob(conf);
>   }
> and this method is given in Hadoop The Definitive Guide 2012 book by Oreilly.
> 
> 
> public static void main(String[] args) throws Exception {
>   if (args.length != 2) {
>     System.err.println("Usage: MaxTemperature <input path> <output path>");
>     System.exit(-1);
>   }
>   Job job = new Job();
>   job.setJarByClass(MaxTemperature.class);
>   job.setJobName("Max temperature");
>   FileInputFormat.addInputPath(job, new Path(args[0]));
>   FileOutputFormat.setOutputPath(job, new Path(args[1]));
>   job.setMapperClass(MaxTemperatureMapper.class);
>   job.setReducerClass(MaxTemperatureReducer.class);
>   job.setOutputKeyClass(Text.class);
>   job.setOutputValueClass(IntWritable.class);
>   System.exit(job.waitForCompletion(true) ? 0 : 1);
> }
> While trying program given in Oreilly book I found that constructors of Job class are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised to see that they have used deprecated class.
> 
> I would like to know which method everyone uses?
> 
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Chandrash3khar K0tekar
> Mobile - 8884631122
>

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi sheikhar

   The deprecated job constructor is actually deprecated according to the job source code . 
   There is another constructor witch is not deprecated ,you can find in the hint raised by eclipse .

  




�����ҵ� iPhone

�� 2013-4-24��13:53��Chandrashekhar Kotekar <sh...@gmail.com> д����

> Hi,
> 
> 
> I have observed that there are multiple ways to write driver method of Hadoop program.
> 
> Following method is given in Hadoop Tutorial by Yahoo
> 
> 
>  public void run(String inputPath, String outputPath) throws Exception {
>     JobConf conf = new JobConf(WordCount.class);
>     conf.setJobName("wordcount");
> 
>     // the keys are words (strings)
>     conf.setOutputKeyClass(Text.class);
>     // the values are counts (ints)
>     conf.setOutputValueClass(IntWritable.class);
> 
>     conf.setMapperClass(MapClass.class);
>     conf.setReducerClass(Reduce.class);
> 
>     FileInputFormat.addInputPath(conf, new Path(inputPath));
>     FileOutputFormat.setOutputPath(conf, new Path(outputPath));
> 
>     JobClient.runJob(conf);
>   }
> and this method is given in Hadoop The Definitive Guide 2012 book by Oreilly.
> 
> 
> public static void main(String[] args) throws Exception {
>   if (args.length != 2) {
>     System.err.println("Usage: MaxTemperature <input path> <output path>");
>     System.exit(-1);
>   }
>   Job job = new Job();
>   job.setJarByClass(MaxTemperature.class);
>   job.setJobName("Max temperature");
>   FileInputFormat.addInputPath(job, new Path(args[0]));
>   FileOutputFormat.setOutputPath(job, new Path(args[1]));
>   job.setMapperClass(MaxTemperatureMapper.class);
>   job.setReducerClass(MaxTemperatureReducer.class);
>   job.setOutputKeyClass(Text.class);
>   job.setOutputValueClass(IntWritable.class);
>   System.exit(job.waitForCompletion(true) ? 0 : 1);
> }
> While trying program given in Oreilly book I found that constructors of Job class are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised to see that they have used deprecated class.
> 
> I would like to know which method everyone uses?
> 
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Chandrash3khar K0tekar
> Mobile - 8884631122
>

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by Jens Scheidtmann <je...@gmail.com>.

Dear Chandrash3khar K0tekar,

Using the run() method implies implementing Tool and using ToolRunner. This
gives as additional benefit that some "standard" hadoop command line
options are available. See here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/ToolRunner.java

Best regards,

Jens

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by Jens Scheidtmann <je...@gmail.com>.

Dear Chandrash3khar K0tekar,

Using the run() method implies implementing Tool and using ToolRunner. This
gives as additional benefit that some "standard" hadoop command line
options are available. See here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/ToolRunner.java

Best regards,

Jens

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by Jens Scheidtmann <je...@gmail.com>.

Dear Chandrash3khar K0tekar,

Using the run() method implies implementing Tool and using ToolRunner. This
gives as additional benefit that some "standard" hadoop command line
options are available. See here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/ToolRunner.java

Best regards,

Jens

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by Jens Scheidtmann <je...@gmail.com>.

Dear Chandrash3khar K0tekar,

Using the run() method implies implementing Tool and using ToolRunner. This
gives as additional benefit that some "standard" hadoop command line
options are available. See here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/ToolRunner.java

Best regards,

Jens

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi sheikhar

   The deprecated job constructor is actually deprecated according to the job source code . 
   There is another constructor witch is not deprecated ,you can find in the hint raised by eclipse .

  




发自我的 iPhone

在 2013-4-24，13:53，Chandrashekhar Kotekar <sh...@gmail.com> 写道：

> Hi,
> 
> 
> I have observed that there are multiple ways to write driver method of Hadoop program.
> 
> Following method is given in Hadoop Tutorial by Yahoo
> 
> 
>  public void run(String inputPath, String outputPath) throws Exception {
>     JobConf conf = new JobConf(WordCount.class);
>     conf.setJobName("wordcount");
> 
>     // the keys are words (strings)
>     conf.setOutputKeyClass(Text.class);
>     // the values are counts (ints)
>     conf.setOutputValueClass(IntWritable.class);
> 
>     conf.setMapperClass(MapClass.class);
>     conf.setReducerClass(Reduce.class);
> 
>     FileInputFormat.addInputPath(conf, new Path(inputPath));
>     FileOutputFormat.setOutputPath(conf, new Path(outputPath));
> 
>     JobClient.runJob(conf);
>   }
> and this method is given in Hadoop The Definitive Guide 2012 book by Oreilly.
> 
> 
> public static void main(String[] args) throws Exception {
>   if (args.length != 2) {
>     System.err.println("Usage: MaxTemperature <input path> <output path>");
>     System.exit(-1);
>   }
>   Job job = new Job();
>   job.setJarByClass(MaxTemperature.class);
>   job.setJobName("Max temperature");
>   FileInputFormat.addInputPath(job, new Path(args[0]));
>   FileOutputFormat.setOutputPath(job, new Path(args[1]));
>   job.setMapperClass(MaxTemperatureMapper.class);
>   job.setReducerClass(MaxTemperatureReducer.class);
>   job.setOutputKeyClass(Text.class);
>   job.setOutputValueClass(IntWritable.class);
>   System.exit(job.waitForCompletion(true) ? 0 : 1);
> }
> While trying program given in Oreilly book I found that constructors of Job class are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised to see that they have used deprecated class.
> 
> I would like to know which method everyone uses?
> 
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Chandrash3khar K0tekar
> Mobile - 8884631122
>

Re: Multiple ways to write Hadoop program driver - Which one to choose?

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi sheikhar

   The deprecated job constructor is actually deprecated according to the job source code . 
   There is another constructor witch is not deprecated ,you can find in the hint raised by eclipse .

  




�����ҵ� iPhone

�� 2013-4-24��13:53��Chandrashekhar Kotekar <sh...@gmail.com> д����

> Hi,
> 
> 
> I have observed that there are multiple ways to write driver method of Hadoop program.
> 
> Following method is given in Hadoop Tutorial by Yahoo
> 
> 
>  public void run(String inputPath, String outputPath) throws Exception {
>     JobConf conf = new JobConf(WordCount.class);
>     conf.setJobName("wordcount");
> 
>     // the keys are words (strings)
>     conf.setOutputKeyClass(Text.class);
>     // the values are counts (ints)
>     conf.setOutputValueClass(IntWritable.class);
> 
>     conf.setMapperClass(MapClass.class);
>     conf.setReducerClass(Reduce.class);
> 
>     FileInputFormat.addInputPath(conf, new Path(inputPath));
>     FileOutputFormat.setOutputPath(conf, new Path(outputPath));
> 
>     JobClient.runJob(conf);
>   }
> and this method is given in Hadoop The Definitive Guide 2012 book by Oreilly.
> 
> 
> public static void main(String[] args) throws Exception {
>   if (args.length != 2) {
>     System.err.println("Usage: MaxTemperature <input path> <output path>");
>     System.exit(-1);
>   }
>   Job job = new Job();
>   job.setJarByClass(MaxTemperature.class);
>   job.setJobName("Max temperature");
>   FileInputFormat.addInputPath(job, new Path(args[0]));
>   FileOutputFormat.setOutputPath(job, new Path(args[1]));
>   job.setMapperClass(MaxTemperatureMapper.class);
>   job.setReducerClass(MaxTemperatureReducer.class);
>   job.setOutputKeyClass(Text.class);
>   job.setOutputValueClass(IntWritable.class);
>   System.exit(job.waitForCompletion(true) ? 0 : 1);
> }
> While trying program given in Oreilly book I found that constructors of Job class are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised to see that they have used deprecated class.
> 
> I would like to know which method everyone uses?
> 
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Chandrash3khar K0tekar
> Mobile - 8884631122
>