You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Manu <ma...@sela.co.il> on 2012/02/25 21:26:22 UTC

How to create Bag of tuples using eval UDF? How set its schema

Hi

 

I would like to create a bag of tuples using an eval UDF.

I wrote a simple eval method but when I use it pig cannot figure out the
schema of the UDF's output. 

When I call "describe" on the output I get {(null)}

I tries to set the schema using the "as" statement (i.e. "as
{(w1:chararray, w2:chararray)} but pig cannot parse this.

 

To test this I wrote the following eval method and called using the
following commands

 

A = Load 'test' as (x:int,y:int);

B = ForEach test Generate PigTest(x,y);

Describe B;

 

public class CreateBag extends EvalFunc<DataBag>{

 

       TupleFactory mTupleFactory = TupleFactory.getInstance();

       BagFactory mBagFactory = BagFactory.getInstance();

       

       @Override

       public DataBag exec(Tuple input) throws IOException {

                     int a = (int)input.get(0);

                     int b = (int)input.get(1);

                     

                     DataBag result = mBagFactory.newDefaultBag();

                     

                     Tuple t1 = mTupleFactory.newTuple(2);

                     t1.set(0, a+1);

                     t1.set(1, b+1);

                     

                     Tuple t2 = mTupleFactory.newTuple(2);

                     t2.set(0, a+1);

                     t2.set(1, b+1);

                     

                     result.add(t1);

                     result.add(t2);

                     

                     return result;

       }

 

}

 

How can I call it and get the correct schema visible in Pig?
{(n1:int,n2:int)}

 

Thanks

 

Manu Cohen-Yashar

Senior Architect, Cloud Computing and Application Security

Sela Group

 

Phone: 972-4-9881203

Mobile: 972-52-5574551

 


Re: How to create Bag of tuples using eval UDF? How set its schema

Posted by Jonathan Coveney <jc...@gmail.com>.
Manu,

If you take a look at the page, you can see that part of the issue is
simply that you need to name the bag (depending on the version you are
using). doing "as b:bag{t:tuple(w1:chararray, w2:chararray)}" and let me
know if that fails.

As far as how to have a UDF properly output it's schema, you need to
override the outputSchema function. You can tediously build up the Schema
with schema objects, or you can do something kind of like this...

public Schema outputSchema(Schema input) {
    return
Utils.getSchemaFromString("b:bag{t:tuple(w1:chararray,w2:chararray)}");
}

There is some error handling that ignores and I forget if it is Utils or
PigUtils, but the point remains that it's definitely the easiest way to
make a schema, if you know that it won't change based on the input.

2012/2/25 Srinivas Reddy <pi...@gmail.com>

> Manu,
>
> you need to create the schema for the Bag here. Follow the below link.(
> look at schema part )
>
> http://pig.apache.org/docs/r0.7.0/udf.html#Schema
>
> Regards,
>
>
> On Sat, Feb 25, 2012 at 3:26 PM, Manu <ma...@sela.co.il> wrote:
>
> > Hi
> >
> >
> >
> > I would like to create a bag of tuples using an eval UDF.
> >
> > I wrote a simple eval method but when I use it pig cannot figure out the
> > schema of the UDF's output.
> >
> > When I call "describe" on the output I get {(null)}
> >
> > I tries to set the schema using the "as" statement (i.e. "as
> > {(w1:chararray, w2:chararray)} but pig cannot parse this.
> >
> >
> >
> > To test this I wrote the following eval method and called using the
> > following commands
> >
> >
> >
> > A = Load 'test' as (x:int,y:int);
> >
> > B = ForEach test Generate PigTest(x,y);
> >
> > Describe B;
> >
> >
> >
> > public class CreateBag extends EvalFunc<DataBag>{
> >
> >
> >
> >       TupleFactory mTupleFactory = TupleFactory.getInstance();
> >
> >       BagFactory mBagFactory = BagFactory.getInstance();
> >
> >
> >
> >       @Override
> >
> >       public DataBag exec(Tuple input) throws IOException {
> >
> >                     int a = (int)input.get(0);
> >
> >                     int b = (int)input.get(1);
> >
> >
> >
> >                     DataBag result = mBagFactory.newDefaultBag();
> >
> >
> >
> >                     Tuple t1 = mTupleFactory.newTuple(2);
> >
> >                     t1.set(0, a+1);
> >
> >                     t1.set(1, b+1);
> >
> >
> >
> >                     Tuple t2 = mTupleFactory.newTuple(2);
> >
> >                     t2.set(0, a+1);
> >
> >                     t2.set(1, b+1);
> >
> >
> >
> >                     result.add(t1);
> >
> >                     result.add(t2);
> >
> >
> >
> >                     return result;
> >
> >       }
> >
> >
> >
> > }
> >
> >
> >
> > How can I call it and get the correct schema visible in Pig?
> > {(n1:int,n2:int)}
> >
> >
> >
> > Thanks
> >
> >
> >
> > Manu Cohen-Yashar
> >
> > Senior Architect, Cloud Computing and Application Security
> >
> > Sela Group
> >
> >
> >
> > Phone: 972-4-9881203
> >
> > Mobile: 972-52-5574551
> >
> >
> >
> >
>
>
> --
> Regards,
> Srinivas
> Srinivas@cloudwick.com
>

Re: How to create Bag of tuples using eval UDF? How set its schema

Posted by Srinivas Reddy <pi...@gmail.com>.
Manu,

you need to create the schema for the Bag here. Follow the below link.(
look at schema part )

http://pig.apache.org/docs/r0.7.0/udf.html#Schema

Regards,


On Sat, Feb 25, 2012 at 3:26 PM, Manu <ma...@sela.co.il> wrote:

> Hi
>
>
>
> I would like to create a bag of tuples using an eval UDF.
>
> I wrote a simple eval method but when I use it pig cannot figure out the
> schema of the UDF's output.
>
> When I call "describe" on the output I get {(null)}
>
> I tries to set the schema using the "as" statement (i.e. "as
> {(w1:chararray, w2:chararray)} but pig cannot parse this.
>
>
>
> To test this I wrote the following eval method and called using the
> following commands
>
>
>
> A = Load 'test' as (x:int,y:int);
>
> B = ForEach test Generate PigTest(x,y);
>
> Describe B;
>
>
>
> public class CreateBag extends EvalFunc<DataBag>{
>
>
>
>       TupleFactory mTupleFactory = TupleFactory.getInstance();
>
>       BagFactory mBagFactory = BagFactory.getInstance();
>
>
>
>       @Override
>
>       public DataBag exec(Tuple input) throws IOException {
>
>                     int a = (int)input.get(0);
>
>                     int b = (int)input.get(1);
>
>
>
>                     DataBag result = mBagFactory.newDefaultBag();
>
>
>
>                     Tuple t1 = mTupleFactory.newTuple(2);
>
>                     t1.set(0, a+1);
>
>                     t1.set(1, b+1);
>
>
>
>                     Tuple t2 = mTupleFactory.newTuple(2);
>
>                     t2.set(0, a+1);
>
>                     t2.set(1, b+1);
>
>
>
>                     result.add(t1);
>
>                     result.add(t2);
>
>
>
>                     return result;
>
>       }
>
>
>
> }
>
>
>
> How can I call it and get the correct schema visible in Pig?
> {(n1:int,n2:int)}
>
>
>
> Thanks
>
>
>
> Manu Cohen-Yashar
>
> Senior Architect, Cloud Computing and Application Security
>
> Sela Group
>
>
>
> Phone: 972-4-9881203
>
> Mobile: 972-52-5574551
>
>
>
>


-- 
Regards,
Srinivas
Srinivas@cloudwick.com