You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Gianmarco De Francisci Morales <gd...@apache.org> on 2011/10/28 18:37:32 UTC

StoreFunc with Sequence file

Hi pig users,
I implemented a custom StoreFunc to write some data in a binary format to a
Sequence File.

    private RecordWriter<NullWritable, BytesWritable> writer;

    private BytesWritable bytes;

    private DataOutputBuffer dob;


    @SuppressWarnings("rawtypes")

    @Override

    public OutputFormat getOutputFormat() throws IOException {

        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();

    }


    @SuppressWarnings({ "rawtypes", "unchecked" })

    @Override

    public void prepareToWrite(RecordWriter writer) throws IOException {

        this.writer = writer;

        this.bytes = new BytesWritable();

        this.dob = new DataOutputBuffer();

    }

    @Override

    public void putNext(Tuple tuple) throws IOException {

        dob.reset();

        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));

        DataBag childTracesBag = (DataBag) tuple.get(1);

        WritableUtils.writeVLong(dob, childTracesBag.size());

        for (Tuple t : childTracesBag) {

            WritableUtils.writeVInt(dob, (Integer) t.get(0));

            dob.writeLong((Long) t.get(1));

        }

        try {

            bytes.set(dob.getData(), 0, dob.getLength());

            writer.write(NullWritable.get(), bytes);

        } catch (InterruptedException e) {

            e.printStackTrace();

        }

    }


But I get this exception:


ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
recreate exception from backed error: java.io.IOException:
java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable is
not class org.apache.pig.impl.io.NullableText



And if I use a NullableText instead of a NullWritable, I get this other
exception:


ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
recreate exception from backed error: java.io.IOException:
java.io.IOException: wrong value class: org.apache.hadoop.io.BytesWritable
is not class org.apache.pig.impl.io.NullableTuple



There must be something I am doing wrong in telling Pig the types of the
sequence file.

It must be a stupid problem but I don't see it.

Does anybody have a clue?


Thanks,
--
Gianmarco

Re: StoreFunc with Sequence file

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Here is the pig script (I hope the formatting is kept),
I think I could reduce the script to a simple load/store and still have the
same problem, but I didn't have time to check it (I would need to rewrite
the StoreFunc).
FYI, my StoreFunc tries to write a SequenceFile<NullWritable,
BytesWritable>:

    @Override

    public OutputFormat<NullWritable, BytesWritable> getOutputFormat()
throws IOException {

        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();

    }


rawtraces = LOAD '$log' AS (follower:chararray, action:int, time:long);

groupedtraces = GROUP rawtraces BY follower;

traces = FOREACH groupedtraces GENERATE group AS performer,
rawtraces.(action, time) AS t;


rawsn = LOAD '$network' AS (parent:chararray, child:chararray);

groupedsn = GROUP rawsn BY parent;

sn = FOREACH groupedsn GENERATE group AS parent, rawsn.(child) AS children;


join1 = JOIN traces BY performer, sn BY parent;


cleanJ1 = FOREACH join1 GENERATE traces::performer AS parent,
traces::t ASparentTraces, FLATTEN(sn::children)
AS child;

groupedJ1 = GROUP cleanJ1 BY child;

intermediate = FOREACH groupedJ1 GENERATE group AS child, cleanJ1.(parent,
parentTraces) AS legacy;


join2 = JOIN traces BY performer, intermediate BY child;

result = FOREACH join2 GENERATE traces::performer AS child, traces::t
ASchildTraces, intermediate::legacy
AS legacy;


STORE result INTO '$output' USING mypackage.pig.BinStorage();

And here is the stack trace:

java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:255)
Caused by: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:985)
	at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
	at mypackage.pig.BinStorage.putNext(BinStorage.java:75)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
	at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462)
	... 11 more


Cheers,
--
Gianmarco



On Tue, Nov 1, 2011 at 01:55, Ashutosh Chauhan <ha...@apache.org> wrote:

> Actually what I said was not entirely correct. Per Daniel, Pig's load/store
> func are designed to work with InputFormat/OutputFormat which works on
> <ComparableWritable,Writable> so what you are seeing is not expected. Can
> you paste the pig script you are using and the detailed stack trace. You
> can find that in JobTracker log.
>
> Hope it helps,
> Ashutosh
>
> On Mon, Oct 31, 2011 at 04:28, Gianmarco De Francisci Morales <
> gdfm@apache.org> wrote:
>
> > Thanks Ashutosh,
> >
> > your suggestion helped.
> > Actually, I am loading data using PigStorage, so my output <key, value>
> > pair are declared as <NullableText, NullableTuple>.
> >
> > By declaring my getOutputFormat() to return
> > a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to
> make
> > it work.
> >
> > The downside is that now I need to wrap my bytes in a Tuple and wrap the
> > Tuple in a NullableTuple.
> > Is this the intended way it should work?
> > Why not let the user use any <WritableComparable, Writable> pair instead?
> > It should be possible for Pig to use the classes defined by the user in
> the
> > StoreFunc in order to define the OutputKeyClass and OutputValueClass.
> >
> > Cheers,
> > --
> > Gianmarco
> >
> >
> > On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <hashutosh@apache.org
> > >wrote:
> >
> > > Hey Gianmarco,
> > >
> > > How are you loading data in pig script? Using your own LoadFunc. Pig
> > > declares following types to MR framework:
> > > Map:
> > >  KeyIn: Text, ValueIn:Tuple
> > >  Reducer:
> > >  KeyOut: PigNullableWritable, ValueOut:Writable
> > >
> > > So, your loadfunc/storefunc key,value types must extend from these.
> > >
> > > Hope it helps,
> > > Ashutosh
> > >
> > > On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> > > gdfm@apache.org> wrote:
> > >
> > > > Hi pig users,
> > > > I implemented a custom StoreFunc to write some data in a binary
> format
> > > to a
> > > > Sequence File.
> > > >
> > > >    private RecordWriter<NullWritable, BytesWritable> writer;
> > > >
> > > >    private BytesWritable bytes;
> > > >
> > > >    private DataOutputBuffer dob;
> > > >
> > > >
> > > >    @SuppressWarnings("rawtypes")
> > > >
> > > >    @Override
> > > >
> > > >    public OutputFormat getOutputFormat() throws IOException {
> > > >
> > > >        return new SequenceFileOutputFormat<NullWritable,
> > > BytesWritable>();
> > > >
> > > >    }
> > > >
> > > >
> > > >    @SuppressWarnings({ "rawtypes", "unchecked" })
> > > >
> > > >    @Override
> > > >
> > > >    public void prepareToWrite(RecordWriter writer) throws
> IOException {
> > > >
> > > >        this.writer = writer;
> > > >
> > > >        this.bytes = new BytesWritable();
> > > >
> > > >        this.dob = new DataOutputBuffer();
> > > >
> > > >    }
> > > >
> > > >    @Override
> > > >
> > > >    public void putNext(Tuple tuple) throws IOException {
> > > >
> > > >        dob.reset();
> > > >
> > > >        WritableUtils.writeCompressedString(dob, (String)
> tuple.get(0));
> > > >
> > > >        DataBag childTracesBag = (DataBag) tuple.get(1);
> > > >
> > > >        WritableUtils.writeVLong(dob, childTracesBag.size());
> > > >
> > > >        for (Tuple t : childTracesBag) {
> > > >
> > > >            WritableUtils.writeVInt(dob, (Integer) t.get(0));
> > > >
> > > >            dob.writeLong((Long) t.get(1));
> > > >
> > > >        }
> > > >
> > > >        try {
> > > >
> > > >            bytes.set(dob.getData(), 0, dob.getLength());
> > > >
> > > >            writer.write(NullWritable.get(), bytes);
> > > >
> > > >        } catch (InterruptedException e) {
> > > >
> > > >            e.printStackTrace();
> > > >
> > > >        }
> > > >
> > > >    }
> > > >
> > > >
> > > > But I get this exception:
> > > >
> > > >
> > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > > recreate exception from backed error: java.io.IOException:
> > > > java.io.IOException: wrong key class:
> org.apache.hadoop.io.NullWritable
> > > is
> > > > not class org.apache.pig.impl.io.NullableText
> > > >
> > > >
> > > >
> > > > And if I use a NullableText instead of a NullWritable, I get this
> other
> > > > exception:
> > > >
> > > >
> > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > > recreate exception from backed error: java.io.IOException:
> > > > java.io.IOException: wrong value class:
> > > org.apache.hadoop.io.BytesWritable
> > > > is not class org.apache.pig.impl.io.NullableTuple
> > > >
> > > >
> > > >
> > > > There must be something I am doing wrong in telling Pig the types of
> > the
> > > > sequence file.
> > > >
> > > > It must be a stupid problem but I don't see it.
> > > >
> > > > Does anybody have a clue?
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > Gianmarco
> > > >
> > >
> >
>

Re: StoreFunc with Sequence file

Posted by Ashutosh Chauhan <ha...@apache.org>.
Actually what I said was not entirely correct. Per Daniel, Pig's load/store
func are designed to work with InputFormat/OutputFormat which works on
<ComparableWritable,Writable> so what you are seeing is not expected. Can
you paste the pig script you are using and the detailed stack trace. You
can find that in JobTracker log.

Hope it helps,
Ashutosh

On Mon, Oct 31, 2011 at 04:28, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:

> Thanks Ashutosh,
>
> your suggestion helped.
> Actually, I am loading data using PigStorage, so my output <key, value>
> pair are declared as <NullableText, NullableTuple>.
>
> By declaring my getOutputFormat() to return
> a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to make
> it work.
>
> The downside is that now I need to wrap my bytes in a Tuple and wrap the
> Tuple in a NullableTuple.
> Is this the intended way it should work?
> Why not let the user use any <WritableComparable, Writable> pair instead?
> It should be possible for Pig to use the classes defined by the user in the
> StoreFunc in order to define the OutputKeyClass and OutputValueClass.
>
> Cheers,
> --
> Gianmarco
>
>
> On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <hashutosh@apache.org
> >wrote:
>
> > Hey Gianmarco,
> >
> > How are you loading data in pig script? Using your own LoadFunc. Pig
> > declares following types to MR framework:
> > Map:
> >  KeyIn: Text, ValueIn:Tuple
> >  Reducer:
> >  KeyOut: PigNullableWritable, ValueOut:Writable
> >
> > So, your loadfunc/storefunc key,value types must extend from these.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> > gdfm@apache.org> wrote:
> >
> > > Hi pig users,
> > > I implemented a custom StoreFunc to write some data in a binary format
> > to a
> > > Sequence File.
> > >
> > >    private RecordWriter<NullWritable, BytesWritable> writer;
> > >
> > >    private BytesWritable bytes;
> > >
> > >    private DataOutputBuffer dob;
> > >
> > >
> > >    @SuppressWarnings("rawtypes")
> > >
> > >    @Override
> > >
> > >    public OutputFormat getOutputFormat() throws IOException {
> > >
> > >        return new SequenceFileOutputFormat<NullWritable,
> > BytesWritable>();
> > >
> > >    }
> > >
> > >
> > >    @SuppressWarnings({ "rawtypes", "unchecked" })
> > >
> > >    @Override
> > >
> > >    public void prepareToWrite(RecordWriter writer) throws IOException {
> > >
> > >        this.writer = writer;
> > >
> > >        this.bytes = new BytesWritable();
> > >
> > >        this.dob = new DataOutputBuffer();
> > >
> > >    }
> > >
> > >    @Override
> > >
> > >    public void putNext(Tuple tuple) throws IOException {
> > >
> > >        dob.reset();
> > >
> > >        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));
> > >
> > >        DataBag childTracesBag = (DataBag) tuple.get(1);
> > >
> > >        WritableUtils.writeVLong(dob, childTracesBag.size());
> > >
> > >        for (Tuple t : childTracesBag) {
> > >
> > >            WritableUtils.writeVInt(dob, (Integer) t.get(0));
> > >
> > >            dob.writeLong((Long) t.get(1));
> > >
> > >        }
> > >
> > >        try {
> > >
> > >            bytes.set(dob.getData(), 0, dob.getLength());
> > >
> > >            writer.write(NullWritable.get(), bytes);
> > >
> > >        } catch (InterruptedException e) {
> > >
> > >            e.printStackTrace();
> > >
> > >        }
> > >
> > >    }
> > >
> > >
> > > But I get this exception:
> > >
> > >
> > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > recreate exception from backed error: java.io.IOException:
> > > java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable
> > is
> > > not class org.apache.pig.impl.io.NullableText
> > >
> > >
> > >
> > > And if I use a NullableText instead of a NullWritable, I get this other
> > > exception:
> > >
> > >
> > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > recreate exception from backed error: java.io.IOException:
> > > java.io.IOException: wrong value class:
> > org.apache.hadoop.io.BytesWritable
> > > is not class org.apache.pig.impl.io.NullableTuple
> > >
> > >
> > >
> > > There must be something I am doing wrong in telling Pig the types of
> the
> > > sequence file.
> > >
> > > It must be a stupid problem but I don't see it.
> > >
> > > Does anybody have a clue?
> > >
> > >
> > > Thanks,
> > > --
> > > Gianmarco
> > >
> >
>

Re: StoreFunc with Sequence file

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Thanks Ashutosh,

your suggestion helped.
Actually, I am loading data using PigStorage, so my output <key, value>
pair are declared as <NullableText, NullableTuple>.

By declaring my getOutputFormat() to return
a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to make
it work.

The downside is that now I need to wrap my bytes in a Tuple and wrap the
Tuple in a NullableTuple.
Is this the intended way it should work?
Why not let the user use any <WritableComparable, Writable> pair instead?
It should be possible for Pig to use the classes defined by the user in the
StoreFunc in order to define the OutputKeyClass and OutputValueClass.

Cheers,
--
Gianmarco


On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <ha...@apache.org>wrote:

> Hey Gianmarco,
>
> How are you loading data in pig script? Using your own LoadFunc. Pig
> declares following types to MR framework:
> Map:
>  KeyIn: Text, ValueIn:Tuple
>  Reducer:
>  KeyOut: PigNullableWritable, ValueOut:Writable
>
> So, your loadfunc/storefunc key,value types must extend from these.
>
> Hope it helps,
> Ashutosh
>
> On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> gdfm@apache.org> wrote:
>
> > Hi pig users,
> > I implemented a custom StoreFunc to write some data in a binary format
> to a
> > Sequence File.
> >
> >    private RecordWriter<NullWritable, BytesWritable> writer;
> >
> >    private BytesWritable bytes;
> >
> >    private DataOutputBuffer dob;
> >
> >
> >    @SuppressWarnings("rawtypes")
> >
> >    @Override
> >
> >    public OutputFormat getOutputFormat() throws IOException {
> >
> >        return new SequenceFileOutputFormat<NullWritable,
> BytesWritable>();
> >
> >    }
> >
> >
> >    @SuppressWarnings({ "rawtypes", "unchecked" })
> >
> >    @Override
> >
> >    public void prepareToWrite(RecordWriter writer) throws IOException {
> >
> >        this.writer = writer;
> >
> >        this.bytes = new BytesWritable();
> >
> >        this.dob = new DataOutputBuffer();
> >
> >    }
> >
> >    @Override
> >
> >    public void putNext(Tuple tuple) throws IOException {
> >
> >        dob.reset();
> >
> >        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));
> >
> >        DataBag childTracesBag = (DataBag) tuple.get(1);
> >
> >        WritableUtils.writeVLong(dob, childTracesBag.size());
> >
> >        for (Tuple t : childTracesBag) {
> >
> >            WritableUtils.writeVInt(dob, (Integer) t.get(0));
> >
> >            dob.writeLong((Long) t.get(1));
> >
> >        }
> >
> >        try {
> >
> >            bytes.set(dob.getData(), 0, dob.getLength());
> >
> >            writer.write(NullWritable.get(), bytes);
> >
> >        } catch (InterruptedException e) {
> >
> >            e.printStackTrace();
> >
> >        }
> >
> >    }
> >
> >
> > But I get this exception:
> >
> >
> > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > recreate exception from backed error: java.io.IOException:
> > java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable
> is
> > not class org.apache.pig.impl.io.NullableText
> >
> >
> >
> > And if I use a NullableText instead of a NullWritable, I get this other
> > exception:
> >
> >
> > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > recreate exception from backed error: java.io.IOException:
> > java.io.IOException: wrong value class:
> org.apache.hadoop.io.BytesWritable
> > is not class org.apache.pig.impl.io.NullableTuple
> >
> >
> >
> > There must be something I am doing wrong in telling Pig the types of the
> > sequence file.
> >
> > It must be a stupid problem but I don't see it.
> >
> > Does anybody have a clue?
> >
> >
> > Thanks,
> > --
> > Gianmarco
> >
>

Re: StoreFunc with Sequence file

Posted by Ashutosh Chauhan <ha...@apache.org>.
Hey Gianmarco,

How are you loading data in pig script? Using your own LoadFunc. Pig
declares following types to MR framework:
Map:
  KeyIn: Text, ValueIn:Tuple
 Reducer:
  KeyOut: PigNullableWritable, ValueOut:Writable

So, your loadfunc/storefunc key,value types must extend from these.

Hope it helps,
Ashutosh

On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:

> Hi pig users,
> I implemented a custom StoreFunc to write some data in a binary format to a
> Sequence File.
>
>    private RecordWriter<NullWritable, BytesWritable> writer;
>
>    private BytesWritable bytes;
>
>    private DataOutputBuffer dob;
>
>
>    @SuppressWarnings("rawtypes")
>
>    @Override
>
>    public OutputFormat getOutputFormat() throws IOException {
>
>        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();
>
>    }
>
>
>    @SuppressWarnings({ "rawtypes", "unchecked" })
>
>    @Override
>
>    public void prepareToWrite(RecordWriter writer) throws IOException {
>
>        this.writer = writer;
>
>        this.bytes = new BytesWritable();
>
>        this.dob = new DataOutputBuffer();
>
>    }
>
>    @Override
>
>    public void putNext(Tuple tuple) throws IOException {
>
>        dob.reset();
>
>        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));
>
>        DataBag childTracesBag = (DataBag) tuple.get(1);
>
>        WritableUtils.writeVLong(dob, childTracesBag.size());
>
>        for (Tuple t : childTracesBag) {
>
>            WritableUtils.writeVInt(dob, (Integer) t.get(0));
>
>            dob.writeLong((Long) t.get(1));
>
>        }
>
>        try {
>
>            bytes.set(dob.getData(), 0, dob.getLength());
>
>            writer.write(NullWritable.get(), bytes);
>
>        } catch (InterruptedException e) {
>
>            e.printStackTrace();
>
>        }
>
>    }
>
>
> But I get this exception:
>
>
> ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> recreate exception from backed error: java.io.IOException:
> java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable is
> not class org.apache.pig.impl.io.NullableText
>
>
>
> And if I use a NullableText instead of a NullWritable, I get this other
> exception:
>
>
> ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> recreate exception from backed error: java.io.IOException:
> java.io.IOException: wrong value class: org.apache.hadoop.io.BytesWritable
> is not class org.apache.pig.impl.io.NullableTuple
>
>
>
> There must be something I am doing wrong in telling Pig the types of the
> sequence file.
>
> It must be a stupid problem but I don't see it.
>
> Does anybody have a clue?
>
>
> Thanks,
> --
> Gianmarco
>