You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "C 4.6" <cf...@gmail.com> on 2011/01/25 19:22:40 UTC
Custom Seq File Loader: ClassNotFoundException

Hi All,

I am having ClassNotFound problems w.r.t. a custom Load function.

- I am using Pig-0.7.0 with Hadoop-0.20.2
- Input to the job is a sequence file with custom key/value data
- I am including the load UDF source below. Note that the UDF does not care
  about what is inside the sequence file ifself.
- This is the Pig script I run:

>> register MyUDF.jar;
>> data = LOAD 'myfile.seq' USING MyLoader();
>> DUMP data;

- IF I use a sequence file containing primitive types (key=int, val=text),
  the script and my load function behave exactly as expected

- However, when I use the intended sequence file, I get the following
  ClassNotFound error:

java.lang.RuntimeException: java.io.IOException: WritableName can't load
class: com...DataOutputKey
        at
org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:1598)
        at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1548)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
        at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:133)
        at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.io.IOException: WritableName can't load class:
com...DataOutputKey
        at org.apache.hadoop.io.WritableName.getClass(WritableName.java:73)
        at
org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:1596)
        ... 10 more
Caused by: java.lang.ClassNotFoundException: com...DataOutputKey
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
        at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)

- My UDF jar does contain the data types that appear to be missing when the
sequence file reader
  is looking for those classes (have also double checked it by manually
exploding the jar)

- Any suggestions on how to resolve this issue?

Thanks,
CF

===

public class MyLoader extends FileInputLoadFunc {

    private SequenceFileRecordReader<Writable, Writable> recordReader =
null;

    private Writable value;

    private final ArrayList<Object> parsedTuple = new ArrayList<Object>(50);

    protected TupleFactory tupleFactory = TupleFactory.getInstance();

    @SuppressWarnings("unchecked")
    @Override
    public InputFormat getInputFormat() throws IOException {
        return new SequenceFileInputFormat<Writable, Writable>();
    }

    @SuppressWarnings("unchecked")
    @Override
    public void prepareToRead(final RecordReader recordReader,
            final PigSplit pigSplit) throws IOException {
        this.recordReader = (SequenceFileRecordReader<Writable, Writable>)
recordReader;
    }

    @Override
    public void setLocation(final String location, final Job job)
            throws IOException {
        FileInputFormat.setInputPaths(job, location);
    }

    private void parseValue(final Writable value)
            throws InvalidProtocolBufferException {
        parsedTuple.add("one");
    }

    @Override
    public Tuple getNext() throws IOException {
        boolean goOn = true;

        Tuple t = null;

        try {
            goOn = recordReader.nextKeyValue();
        } catch (final InterruptedException ie) {
            throw new IOException(ie);
        }

        if (!goOn) {
            return t;
        }

        final Writable value = recordReader.getCurrentValue();

        if (value == null) {
            return t;
        }

        parseValue(value);

        t = tupleFactory.newTuple(parsedTuple);

        parsedTuple.clear();

        goOn = false;

        return t;
    }
}