You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Haijun Cao <Ha...@projectrialto.com> on 2008/05/06 02:28:01 UTC
hadoop.mapred.join.Parser does not work with KeyValueTextInputFormat
Hi, Chris,
Thanks for adding the map side join feature (http://issues.apache.org/jira/browse/HADOOP-2085)
I tried the join example with KeyValueTextInputFormat as input format, but got following exception:
java.lang.NullPointerException
at org.apache.hadoop.mapred.KeyValueTextInputFormat.isSplitable(KeyValueTextInputFormat.java:44)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:185)
at org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:304)
at org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:374)
at org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeInputFormat.java:129)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:542)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:803)
at org.apache.hadoop.examples.Join.run(Join.java:169)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Join.main(Join.java:178)
The exception happened because hadoop.mapred.join.Parser instantiate InputFormat class without JobConf, while KeyValueTextInputFormat need its configure method to be called with proper JobConf.
public void parse(List<Token> ll) throws IOException {
StringBuilder sb = new StringBuilder();
Iterator<Token> i = ll.iterator();
while (i.hasNext()) {
Token t = i.next();
if (TType.COMMA.equals(t.getType())) {
try {
inf = (InputFormat)ReflectionUtils.newInstance(
Class.forName(sb.toString()).asSubclass(InputFormat.class),
null); ß missing JobConf
As a workaround, I added "setConf" in hadoop.mapred.join.Parser's getSplits method, then the NullPointerException is gone and join works as expected.
I am not sure if this is a clean fix, ideally, I'd like to pass the JobConf object in parse method when InputFormat is instantiated...
public InputSplit[] getSplits(JobConf job, int numSplits)
throws IOException {
ReflectionUtils.setConf (inf, job); ß my workaround
return inf.getSplits(getConf(job), numSplits);
}
Many InputFormat subclasses may need their configure method to be called, can you look into this issue to see if it is a valid bug? Thanks.
Haijun