You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prashant Kommireddi <pr...@gmail.com> on 2011/11/30 01:20:14 UTC

Re: Bag of Tuples Return type of UDF

You could possibly FLATTEN out the results from your UDF
u = foreach g generate FLATTEN(UrlCoOccurence($1)) as (v1, v2);

On Tue, Nov 29, 2011 at 3:49 PM, Ayon Sinha <ay...@yahoo.com> wrote:

> Hi,
> I have a UDF that is:
> public DataBag exec(Tuple input) throws IOException
>
> This bag has tuples with 2 String fields each.
> How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from the
> UDF call
>
> u = foreach g generate UrlCoOccurence($1) as pairs;
>
>
>
> I tried this
> u = foreach g generate (bag{tuple(chararray,
> chararray)})UrlCoOccurence($1) as pairs;
>
> this gives me:
> 2011-11-29 15:47:07,048 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1052: Cannot cast bag with schema bag to bag with schema
> bag({(chararray,chararray)})
>
>
> Basically my UDF returns a bag of tuples which have 2 values. I need to
> flatten it and v1 & v2.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>

Re: Bag of Tuples Return type of UDF

Posted by Ayon Sinha <ay...@yahoo.com>.
This worked:

public Schema outputSchema(Schema input) {
        try {
Schema.FieldSchema url1 = new Schema.FieldSchema("url1", DataType.CHARARRAY);
Schema.FieldSchema url2 = new Schema.FieldSchema("url2", DataType.CHARARRAY);
List<Schema.FieldSchema> fields = new ArrayList<Schema.FieldSchema>();
fields.add(url1); fields.add(url2);
Schema tupleSchema = new Schema(fields);
Schema.FieldSchema tupleFs;
tupleFs = new Schema.FieldSchema("tuple_of_urls", tupleSchema,
        DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs);
            bagSchema.setTwoLevelAccessRequired(true);
            Schema.FieldSchema bagFs = new Schema.FieldSchema(
                        "bag_of_urlTuples",bagSchema, DataType.BAG);
            
            return new Schema(bagFs); 
} catch (FrontendException e) {
throw new RuntimeException("Unable to compute TOKENIZE schema.");
}
}
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
 From: Jonathan Coveney <jc...@gmail.com>
To: user@pig.apache.org 
Sent: Tuesday, November 29, 2011 5:17 PM
Subject: Re: Bag of Tuples Return type of UDF
 
You can flatten, or you can override outputSchema so that you can specify
the output of the UDF.

You can find an example here:
http://pig.apache.org/docs/r0.9.1/udf.html#eval-functions

Or, if you're using trunk, you can use Dmitriy's @OutputSchema annotation,
so you'd do

@OutputSchema("b:bag{t:tuple(x:chararray,y:chararray)}")

and it would do the magic for you :)

If you're not using trunk, it is definitely still easier to use
Utils.getSchemaFromString() instead of building it up.

2011/11/29 Prashant Kommireddi <pr...@gmail.com>

> You could possibly FLATTEN out the results from your UDF
> u = foreach g generate FLATTEN(UrlCoOccurence($1)) as (v1, v2);
>
> On Tue, Nov 29, 2011 at 3:49 PM, Ayon Sinha <ay...@yahoo.com> wrote:
>
> > Hi,
> > I have a UDF that is:
> > public DataBag exec(Tuple input) throws IOException
> >
> > This bag has tuples with 2 String fields each.
> > How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from
> the
> > UDF call
> >
> > u = foreach g generate UrlCoOccurence($1) as pairs;
> >
> >
> >
> > I tried this
> > u = foreach g generate (bag{tuple(chararray,
> > chararray)})UrlCoOccurence($1) as pairs;
> >
> > this gives me:
> > 2011-11-29 15:47:07,048 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1052: Cannot cast bag with schema bag to bag with schema
> > bag({(chararray,chararray)})
> >
> >
> > Basically my UDF returns a bag of tuples which have 2 values. I need to
> > flatten it and v1 & v2.
> >
> > -Ayon
> > See My Photos on Flickr
> > Also check out my Blog for answers to commonly asked questions.
> >
>

Re: Bag of Tuples Return type of UDF

Posted by Jonathan Coveney <jc...@gmail.com>.
You can flatten, or you can override outputSchema so that you can specify
the output of the UDF.

You can find an example here:
http://pig.apache.org/docs/r0.9.1/udf.html#eval-functions

Or, if you're using trunk, you can use Dmitriy's @OutputSchema annotation,
so you'd do

@OutputSchema("b:bag{t:tuple(x:chararray,y:chararray)}")

and it would do the magic for you :)

If you're not using trunk, it is definitely still easier to use
Utils.getSchemaFromString() instead of building it up.

2011/11/29 Prashant Kommireddi <pr...@gmail.com>

> You could possibly FLATTEN out the results from your UDF
> u = foreach g generate FLATTEN(UrlCoOccurence($1)) as (v1, v2);
>
> On Tue, Nov 29, 2011 at 3:49 PM, Ayon Sinha <ay...@yahoo.com> wrote:
>
> > Hi,
> > I have a UDF that is:
> > public DataBag exec(Tuple input) throws IOException
> >
> > This bag has tuples with 2 String fields each.
> > How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from
> the
> > UDF call
> >
> > u = foreach g generate UrlCoOccurence($1) as pairs;
> >
> >
> >
> > I tried this
> > u = foreach g generate (bag{tuple(chararray,
> > chararray)})UrlCoOccurence($1) as pairs;
> >
> > this gives me:
> > 2011-11-29 15:47:07,048 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1052: Cannot cast bag with schema bag to bag with schema
> > bag({(chararray,chararray)})
> >
> >
> > Basically my UDF returns a bag of tuples which have 2 values. I need to
> > flatten it and v1 & v2.
> >
> > -Ayon
> > See My Photos on Flickr
> > Also check out my Blog for answers to commonly asked questions.
> >
>