You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prashant Kommireddi <pr...@gmail.com> on 2011/11/30 01:20:14 UTC
Re: Bag of Tuples Return type of UDF
You could possibly FLATTEN out the results from your UDF
u = foreach g generate FLATTEN(UrlCoOccurence($1)) as (v1, v2);
On Tue, Nov 29, 2011 at 3:49 PM, Ayon Sinha <ay...@yahoo.com> wrote:
> Hi,
> I have a UDF that is:
> public DataBag exec(Tuple input) throws IOException
>
> This bag has tuples with 2 String fields each.
> How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from the
> UDF call
>
> u = foreach g generate UrlCoOccurence($1) as pairs;
>
>
>
> I tried this
> u = foreach g generate (bag{tuple(chararray,
> chararray)})UrlCoOccurence($1) as pairs;
>
> this gives me:
> 2011-11-29 15:47:07,048 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1052: Cannot cast bag with schema bag to bag with schema
> bag({(chararray,chararray)})
>
>
> Basically my UDF returns a bag of tuples which have 2 values. I need to
> flatten it and v1 & v2.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
Re: Bag of Tuples Return type of UDF
Posted by Ayon Sinha <ay...@yahoo.com>.
This worked:
public Schema outputSchema(Schema input) {
try {
Schema.FieldSchema url1 = new Schema.FieldSchema("url1", DataType.CHARARRAY);
Schema.FieldSchema url2 = new Schema.FieldSchema("url2", DataType.CHARARRAY);
List<Schema.FieldSchema> fields = new ArrayList<Schema.FieldSchema>();
fields.add(url1); fields.add(url2);
Schema tupleSchema = new Schema(fields);
Schema.FieldSchema tupleFs;
tupleFs = new Schema.FieldSchema("tuple_of_urls", tupleSchema,
DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs);
bagSchema.setTwoLevelAccessRequired(true);
Schema.FieldSchema bagFs = new Schema.FieldSchema(
"bag_of_urlTuples",bagSchema, DataType.BAG);
return new Schema(bagFs);
} catch (FrontendException e) {
throw new RuntimeException("Unable to compute TOKENIZE schema.");
}
}
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
________________________________
From: Jonathan Coveney <jc...@gmail.com>
To: user@pig.apache.org
Sent: Tuesday, November 29, 2011 5:17 PM
Subject: Re: Bag of Tuples Return type of UDF
You can flatten, or you can override outputSchema so that you can specify
the output of the UDF.
You can find an example here:
http://pig.apache.org/docs/r0.9.1/udf.html#eval-functions
Or, if you're using trunk, you can use Dmitriy's @OutputSchema annotation,
so you'd do
@OutputSchema("b:bag{t:tuple(x:chararray,y:chararray)}")
and it would do the magic for you :)
If you're not using trunk, it is definitely still easier to use
Utils.getSchemaFromString() instead of building it up.
2011/11/29 Prashant Kommireddi <pr...@gmail.com>
> You could possibly FLATTEN out the results from your UDF
> u = foreach g generate FLATTEN(UrlCoOccurence($1)) as (v1, v2);
>
> On Tue, Nov 29, 2011 at 3:49 PM, Ayon Sinha <ay...@yahoo.com> wrote:
>
> > Hi,
> > I have a UDF that is:
> > public DataBag exec(Tuple input) throws IOException
> >
> > This bag has tuples with 2 String fields each.
> > How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from
> the
> > UDF call
> >
> > u = foreach g generate UrlCoOccurence($1) as pairs;
> >
> >
> >
> > I tried this
> > u = foreach g generate (bag{tuple(chararray,
> > chararray)})UrlCoOccurence($1) as pairs;
> >
> > this gives me:
> > 2011-11-29 15:47:07,048 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1052: Cannot cast bag with schema bag to bag with schema
> > bag({(chararray,chararray)})
> >
> >
> > Basically my UDF returns a bag of tuples which have 2 values. I need to
> > flatten it and v1 & v2.
> >
> > -Ayon
> > See My Photos on Flickr
> > Also check out my Blog for answers to commonly asked questions.
> >
>
Re: Bag of Tuples Return type of UDF
Posted by Jonathan Coveney <jc...@gmail.com>.
You can flatten, or you can override outputSchema so that you can specify
the output of the UDF.
You can find an example here:
http://pig.apache.org/docs/r0.9.1/udf.html#eval-functions
Or, if you're using trunk, you can use Dmitriy's @OutputSchema annotation,
so you'd do
@OutputSchema("b:bag{t:tuple(x:chararray,y:chararray)}")
and it would do the magic for you :)
If you're not using trunk, it is definitely still easier to use
Utils.getSchemaFromString() instead of building it up.
2011/11/29 Prashant Kommireddi <pr...@gmail.com>
> You could possibly FLATTEN out the results from your UDF
> u = foreach g generate FLATTEN(UrlCoOccurence($1)) as (v1, v2);
>
> On Tue, Nov 29, 2011 at 3:49 PM, Ayon Sinha <ay...@yahoo.com> wrote:
>
> > Hi,
> > I have a UDF that is:
> > public DataBag exec(Tuple input) throws IOException
> >
> > This bag has tuples with 2 String fields each.
> > How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from
> the
> > UDF call
> >
> > u = foreach g generate UrlCoOccurence($1) as pairs;
> >
> >
> >
> > I tried this
> > u = foreach g generate (bag{tuple(chararray,
> > chararray)})UrlCoOccurence($1) as pairs;
> >
> > this gives me:
> > 2011-11-29 15:47:07,048 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1052: Cannot cast bag with schema bag to bag with schema
> > bag({(chararray,chararray)})
> >
> >
> > Basically my UDF returns a bag of tuples which have 2 values. I need to
> > flatten it and v1 & v2.
> >
> > -Ayon
> > See My Photos on Flickr
> > Also check out my Blog for answers to commonly asked questions.
> >
>