You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/08/19 16:20:40 UTC
Problem writing LoadFunc - why can't I use a sub-class of
FileInputFormat as my InputFormat?
I am writing a LoadFunc called ArcFileReader to load Common Crawl data in
ArcFile format. There is already a ArcRecord, ArcRecordReader and
ArcInputFormat for Hadoop.
ArcInputFormat extends Hadoop's FileInputFormat, which implements Hadoop's
InputFormat interface. Why then can't I specify ArcInputFormat as my
InputFormat in my LoadFunc?
@Override
public InputFormat getInputFormat() throws IOException {
return new ArcInputFormat();
}
Java complains - attempting to use incompatible return type. What gives?
--
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Re: Problem writing LoadFunc - why can't I use a sub-class of
FileInputFormat as my InputFormat?
Posted by Russell Jurney <ru...@gmail.com>.
Figured this out - I was ping ponging between mapred and mapreduce APIs.
package org.apache.pig.piggybank.storage.arc;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.nutch.tools.arc.ArcInputFormat;
import org.apache.nutch.tools.arc.ArcRecordReader;
import java.io.IOException;
public class PigArcInputFormat extends FileInputFormat<Text, BytesWritable>
{
public PigArcInputFormat() {
}
public ArcInputFormat getInputFormat() throws IOException {
return new ArcInputFormat();
}
public RecordReader<Text, BytesWritable> getRecordReader(InputSplit
split, JobConf config, Reporter reporter)
throws IOException {
return new ArcRecordReader(config, (FileSplit)split);
}
}
On Sun, Aug 19, 2012 at 7:20 AM, Russell Jurney <ru...@gmail.com>wrote:
> I am writing a LoadFunc called ArcFileReader to load Common Crawl data in
> ArcFile format. There is already a ArcRecord, ArcRecordReader and
> ArcInputFormat for Hadoop.
>
> ArcInputFormat extends Hadoop's FileInputFormat, which implements Hadoop's
> InputFormat interface. Why then can't I specify ArcInputFormat as my
> InputFormat in my LoadFunc?
>
> @Override
> public InputFormat getInputFormat() throws IOException {
> return new ArcInputFormat();
> }
>
>
> Java complains - attempting to use incompatible return type. What gives?
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>
--
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com