You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Brian Long <br...@dotspots.com> on 2009/05/23 00:37:47 UTC

UDF with parameters?

Hi,

I'm interested in developing a PERCENTILE UDF, e.g. for calculating a
median, 99th percentile, 90th percentile, etc. I'd like the UDF to be
parametric with respect to the percentile being requested, but I don't see
any way to do that, and it seems like I might need to create PERCENTILE_50,
PERCENTILE_90, etc type UDFs explicitly, versus being able to do something
like GENERATE PERCENTILE(90, duration)

I'm new to Pig, so I might be missing the way to do this... is it possible?

Thanks,
Brian

RE: UDF with parameters?

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
You should be able to send the percentile rank that you want to
calculate as a udf argument like the way you stated - generate
Percentile(90, duration) - here 90 will be an integer constant sent as
the first argument to your udf. 

-----Original Message-----
From: Brian Long [mailto:brian@dotspots.com] 
Sent: Friday, May 22, 2009 3:38 PM
To: pig-dev@hadoop.apache.org
Subject: UDF with parameters?

Hi,

I'm interested in developing a PERCENTILE UDF, e.g. for calculating a
median, 99th percentile, 90th percentile, etc. I'd like the UDF to be
parametric with respect to the percentile being requested, but I don't
see
any way to do that, and it seems like I might need to create
PERCENTILE_50,
PERCENTILE_90, etc type UDFs explicitly, versus being able to do
something
like GENERATE PERCENTILE(90, duration)

I'm new to Pig, so I might be missing the way to do this... is it
possible?

Thanks,
Brian

Re: UDF with parameters?

Posted by Alan Gates <ga...@yahoo-inc.com>.
Yes, it is possible.  The UDF should take the percentage you want as a  
constructor argument.  It will have to be passed as a string and  
converted.  Then in your Pig Latin, you will use the DEFINE statement  
to pass the argument to the constructor.

REGISTER /src/myfunc.jar
DEFINE percentile myfunc.percentile('90');
A = LOAD 'students' as (name, gpa);
B = FOREACH A GENERATE percentile(gpa);

See http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html#DEFINE for  
more details.

Alan.

On May 22, 2009, at 3:37 PM, Brian Long wrote:

> Hi,
>
> I'm interested in developing a PERCENTILE UDF, e.g. for calculating a
> median, 99th percentile, 90th percentile, etc. I'd like the UDF to be
> parametric with respect to the percentile being requested, but I  
> don't see
> any way to do that, and it seems like I might need to create  
> PERCENTILE_50,
> PERCENTILE_90, etc type UDFs explicitly, versus being able to do  
> something
> like GENERATE PERCENTILE(90, duration)
>
> I'm new to Pig, so I might be missing the way to do this... is it  
> possible?
>
> Thanks,
> Brian