You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Haijun Cao <Ha...@projectrialto.com> on 2008/05/06 02:28:01 UTC

hadoop.mapred.join.Parser does not work with KeyValueTextInputFormat

Hi, Chris,

 

Thanks for adding the map side join feature (http://issues.apache.org/jira/browse/HADOOP-2085)

 

I tried the join example with KeyValueTextInputFormat as input format, but got following exception:

 

 

java.lang.NullPointerException

        at org.apache.hadoop.mapred.KeyValueTextInputFormat.isSplitable(KeyValueTextInputFormat.java:44)

        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:185)

        at org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:304)

        at org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:374)

        at org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeInputFormat.java:129)

        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:542)

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:803)

        at org.apache.hadoop.examples.Join.run(Join.java:169)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.examples.Join.main(Join.java:178)

 

 

The exception happened because hadoop.mapred.join.Parser instantiate InputFormat class without JobConf, while KeyValueTextInputFormat need its configure method to be called with proper JobConf.

 

    public void parse(List<Token> ll) throws IOException {

      StringBuilder sb = new StringBuilder();

      Iterator<Token> i = ll.iterator();

      while (i.hasNext()) {

        Token t = i.next();

        if (TType.COMMA.equals(t.getType())) {

          try {

            inf = (InputFormat)ReflectionUtils.newInstance(

                Class.forName(sb.toString()).asSubclass(InputFormat.class),

                null);     ß missing JobConf

 

As a workaround, I added "setConf" in hadoop.mapred.join.Parser's getSplits method, then the NullPointerException is gone and join works as expected. 

 

I am not sure if this is a clean fix, ideally, I'd like to pass the JobConf object in parse method when InputFormat is instantiated...

 

    public InputSplit[] getSplits(JobConf job, int numSplits)

        throws IOException {

       ReflectionUtils.setConf (inf, job);   ß my workaround

      return inf.getSplits(getConf(job), numSplits);

    }

 

Many InputFormat subclasses may need their configure method to be called, can you look into this issue to see if it is a valid bug? Thanks.

 

Haijun