You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/08/19 16:20:40 UTC

Problem writing LoadFunc - why can't I use a sub-class of FileInputFormat as my InputFormat?

I am writing a LoadFunc called ArcFileReader to load Common Crawl data in
ArcFile format. There is already a ArcRecord, ArcRecordReader and
ArcInputFormat for Hadoop.

ArcInputFormat extends Hadoop's FileInputFormat, which implements Hadoop's
InputFormat interface. Why then can't I specify ArcInputFormat as my
InputFormat in my LoadFunc?

    @Override
    public InputFormat getInputFormat() throws IOException {
        return new ArcInputFormat();
    }


Java complains - attempting to use incompatible return type. What gives?

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Problem writing LoadFunc - why can't I use a sub-class of FileInputFormat as my InputFormat?

Posted by Russell Jurney <ru...@gmail.com>.
Figured this out - I was ping ponging between mapred and mapreduce APIs.

package org.apache.pig.piggybank.storage.arc;

import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.nutch.tools.arc.ArcInputFormat;
import org.apache.nutch.tools.arc.ArcRecordReader;

import java.io.IOException;

public class PigArcInputFormat extends FileInputFormat<Text, BytesWritable>
{

    public PigArcInputFormat() {
    }

    public ArcInputFormat getInputFormat() throws IOException {
        return new ArcInputFormat();
    }

    public RecordReader<Text, BytesWritable> getRecordReader(InputSplit
split, JobConf config, Reporter reporter)
            throws IOException {
        return new ArcRecordReader(config, (FileSplit)split);
    }
}


On Sun, Aug 19, 2012 at 7:20 AM, Russell Jurney <ru...@gmail.com>wrote:

> I am writing a LoadFunc called ArcFileReader to load Common Crawl data in
> ArcFile format. There is already a ArcRecord, ArcRecordReader and
> ArcInputFormat for Hadoop.
>
> ArcInputFormat extends Hadoop's FileInputFormat, which implements Hadoop's
> InputFormat interface. Why then can't I specify ArcInputFormat as my
> InputFormat in my LoadFunc?
>
>     @Override
>     public InputFormat getInputFormat() throws IOException {
>         return new ArcInputFormat();
>     }
>
>
> Java complains - attempting to use incompatible return type. What gives?
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com