You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Dennis Kubes <ku...@apache.org> on 2007/10/09 20:13:12 UTC

Bug in either ToolBase or ToolRunner or Nutch jobs

I don't know if this bug is in the way ToolRunner works, ToolBase works, 
or the way Nutch implements some of its jobs, but here is the scenario.

Many Nutch jobs (Injector for instance) use ToolBase and call the 
doMain(Configuration conf, String[] args) method to run.  ToolBase now 
calls ToolRunner as return ToolRunner.run(this, args);  The problem is 
that any the configuration object passed in to toolbase is not set as 
the conf object in Toolbase and so is essentially ignored by ToolRunner. 
  So any nutch resources are ignored.

The solution to this is pretty simple:

   public final int doMain(Configuration conf, String[] args) throws 
Exception {
     setConf(conf);
     return ToolRunner.run(this, args);
   }

But since we are moving away from ToolBase I didn't know if there is a 
better solution for this, for example should the current Nutch jobs be 
moved over to ToolRunner instead or should we make this simple change 
now for compatibility as we move the jobs to ToolRunner?  Any guidance 
is appreciated.

Dennis Kubes

Re: Bug in either ToolBase or ToolRunner or Nutch jobs

Posted by Enis Soztutar <en...@gmail.com>.
Hi Dennis,

ToolRunner runs the tools given to it, by first modifying the 
configuration and then passing it to the object. We have two methods 
ToolRunner#run(Tool, String[]) and ToolRunner#run(Configuration, Tool, 
String[]). The former delegates to the latter using tool.getConf(), and 
the latter instantiates a configuration if null is passed, and calls 
tool#setConf().

In most cases, hadoop inner classes prefer to use the more simplistic 
ToolRunner#run(Tool, String[]), but i bet nutch would also prefer to use 
the form :

public static void main(String argv[]) throws Exception {
    int res = ToolRunner.run(NutchConfiguration.create(), new ToolImp(), 
argv);
    System.exit(res);
}

Hope this solves the problem.

Dennis Kubes wrote:
> I don't know if this bug is in the way ToolRunner works, ToolBase 
> works, or the way Nutch implements some of its jobs, but here is the 
> scenario.
>
> Many Nutch jobs (Injector for instance) use ToolBase and call the 
> doMain(Configuration conf, String[] args) method to run.  ToolBase now 
> calls ToolRunner as return ToolRunner.run(this, args);  The problem is 
> that any the configuration object passed in to toolbase is not set as 
> the conf object in Toolbase and so is essentially ignored by 
> ToolRunner.  So any nutch resources are ignored.
>
> The solution to this is pretty simple:
>
>   public final int doMain(Configuration conf, String[] args) throws 
> Exception {
>     setConf(conf);
>     return ToolRunner.run(this, args);
>   }
>
> But since we are moving away from ToolBase I didn't know if there is a 
> better solution for this, for example should the current Nutch jobs be 
> moved over to ToolRunner instead or should we make this simple change 
> now for compatibility as we move the jobs to ToolRunner?  Any guidance 
> is appreciated.
>
> Dennis Kubes
>