You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by "Ruiz, Pierre" <Pi...@analog.com> on 2018/02/27 11:04:09 UTC

passing conf instance around

Hi all,

I'm quite new to Hadopp and only worked with a single node setup so far.  I wrote a local driver that submits Jobs to my cluster.  I instantiate a single Configuration instance right at the start of my process, and pass it around like that:

public static void main(String[] args) {
  int exitCode = ERR_INVALID_ARGS.get();
 Configuration conf = new Configuration(true);

 try {
   if (- == args[0].charAt(0)) {
     exitCode = runSelectedTool(conf, args);
   } else  if (args.length >= 2 && args.length <= 3) {
     exitCode = crunchFullDataset(conf, args);
   }
 } catch (Exception e){
      e.printStackTrace();
      exitCode = ERR_FATAL.get();
  }

  System.exit(exitCode);
}

private static int runSelectedTool(Configuration conf, String[] args) throws Exception {
    int exitCode;
    String toolSwitch = args[0];
    args = Arrays.copyOfRange(args,1,args.length);

    if (SWITCH_FORMATTER_COUNTER.equals(toolSwitch)) {
        exitCode = ToolRunner.run(conf, new FormatterCounter(), args);
    } else if (SWITCH_CANDIDATES_FILTER.equals(toolSwitch)) {
        exitCode = ToolRunner.run(conf, new CandidatesFilter(), args);
    }
}

Prior to this, I was instantiating a new conf object each time I called ToolRunner.run(), but now I use conf.set() & get() to pass values between jobs.  Is it a bad idea (and why), or this the right way to proceed?

Many thanks,
Pierre