You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rajarshi Guha <rg...@indiana.edu> on 2009/05/04 23:59:13 UTC
specifying command line args, but getting an NPE
Hi, I have a Hadoop program in which main() reads in some command line
args:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: subsearch <in> <out>
<pattern>");
System.exit(2);
}
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
pattern = otherArgs[2];
....
}
Here pattern is declared as a static String class variable.
When I run the program using the local tracker, it runs fine and uses
the value of pattern. However, if I run the code in distributed mode,
I get a NullPointerException - as far as I can tell, pattern is
turning out to be null in this case.
If I hard code the value of pattern in to the code that uses it, the
program runs fine.
So my question is: if I need to use an argument, specified on the
command line, do I need to do anything special to the variable holding
it? In other words, the simple assignment
pattern = otherArgs[2];
seems to lead to an NPE when run in distributed mode.
Any pointers would be appreciated
Thanks,
-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
-------------------------------------------------------------------
Q: What's polite and works for the phone company?
A: A deferential operator.
Re: specifying command line args, but getting an NPE
Posted by Rajarshi Guha <rg...@indiana.edu>.
On May 4, 2009, at 6:07 PM, Todd Lipcon wrote:
> The issue here is that your mapper and reducer classes are being
> instantiated in a different JVM from your main() function. In order
> to pass
> data to them, you need to use the Configuration object.
>
> Since you have a simple String here, this should be pretty simple.
> Something
> like:
>
> conf.set("com.example.tool.pattern", otherArgs[2]);
>
> then in the configure() function of your Mapper/Reducer, simply
> retrieve it
> using conf.get("com.example.tool.pattern");
Thanks for the pointer. I'm using Hadoop 0.20.0 and my mapper which
extends Mapper<Object, Text, Text, IntWritable> doesn't seem to have a
configure() method.
Looking at the API I see the superclass has a setup method. Thus in my
class I do:
public static class MoleculeMapper extends Mapper<Object, Text,
Text, IntWritable> {
private Text matches = new Text();
private String pattern;
public void setup(Context context) {
pattern =
context.getConfiguration().get("net.rguha.dc.data.pattern");
System.out.println("pattern = " + pattern);
}
....
}
In my main method I have
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
conf.set("net.rguha.dc.data.pattern", otherArgs[2]);
However, even with this, pattern turns out to be null when printed in
setup().
I just started on Hadoop a day or two ago, and my understanding is
that 0.20.0 had some pretty major refactoring. As a result a lot of
examples I come across on the Net don't seem to work. Could the lack
of the configure() method be due to the refactoring?
-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
-------------------------------------------------------------------
Q: What's polite and works for the phone company?
A: A deferential operator.
Re: specifying command line args, but getting an NPE
Posted by Rajarshi Guha <rg...@indiana.edu>.
On May 4, 2009, at 6:07 PM, Todd Lipcon wrote:
>
> Since you have a simple String here, this should be pretty simple.
> Something
> like:
>
> conf.set("com.example.tool.pattern", otherArgs[2]);
>
> then in the configure() function of your Mapper/Reducer, simply
> retrieve it
> using conf.get("com.example.tool.pattern");
Trial and error solved the problem. It turns out I need to set the
value in the Configuration object before I create the Job object.
Thus, the following works and makes the value of
net.rguha.dc.data.pattern available to the mappers.
Configuration conf = new Configuration();
conf.set("net.rguha.dc.data.pattern", otherArgs[2]);
Job job = new Job(conf, "id 1");
But if conf.set(...) is called after instantiating job, it doesn't.
Is this intended?
-------------------------------------------------------------------
Rajarshi Guha <rg...@indiana.edu>
GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
-------------------------------------------------------------------
Q: What's polite and works for the phone company?
A: A deferential operator.
Re: specifying command line args, but getting an NPE
Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, May 4, 2009 at 2:59 PM, Rajarshi Guha <rg...@indiana.edu> wrote:
> So my question is: if I need to use an argument, specified on the command
> line, do I need to do anything special to the variable holding it? In other
> words, the simple assignment
>
> pattern = otherArgs[2];
>
> seems to lead to an NPE when run in distributed mode.
>
Hi Rajarshi,
The issue here is that your mapper and reducer classes are being
instantiated in a different JVM from your main() function. In order to pass
data to them, you need to use the Configuration object.
Since you have a simple String here, this should be pretty simple. Something
like:
conf.set("com.example.tool.pattern", otherArgs[2]);
then in the configure() function of your Mapper/Reducer, simply retrieve it
using conf.get("com.example.tool.pattern");
Hope that helps,
-Todd