You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sheeba George <sh...@gmail.com> on 2010/12/02 09:48:14 UTC

Re: Writing filter function that takes constructor param?

Hi Daniel
  I have a related question. My UDF has a constructor that takes 2 param.
*

public* TopUDF(*int* top, *int* type){

m_cnt = top;

m_type = type;

}



But when I call instantiate using the below, I get error. Am I doing
something wrong?

define AggregateOthers com.ebay.ewa2.pig.load.TopUDF(3,0);
Thanks
Sheeba

On Tue, Nov 30, 2010 at 1:45 PM, Zach Bailey <za...@dataclip.com>wrote:

>
>  Thanks Daniel. Of course, you are right. Turns out I had a bug elsewhere
> in my UDF that was making me think this was not working correctly. After
> fixing that bug the "define ..." works fine.
>
>
> Using the following works great:
>
>
> define INITIALIZED_UDF com.my.udfs.UDF(constructor_params)
>
> Thanks,
> Zach
>
>
> On Tuesday, November 30, 2010 at 4:40 PM, Daniel Dai wrote:
>
> > Pig always instantiate UDF using the construct parameter defined in
> > "define" statement. ". CONTAINS_STRINGS(haystack) only pass haystack to
> > CONTAINS_STRINGS.exec(). It will not re-initializing the UDF.
> >
> > Daniel
> >
> > Zach Bailey wrote:
> >
> > >  I am trying to do what seems like should be a simple task using pig
> and a UDF I have written but can't seem to figure out the syntax to get it
> working.
> > >
> > >
> > >  At a high level I have a UDF that takes a number of strings that I
> then want to see if exist in some other strings. So I write a UDF called
> CONTAINS_ANY that I would like to initialize with the "needles" and then use
> pig to distribute this search out via hadoop.
> > >
> > >
> > >  The problem is I can't figure out what the correct syntax is to
> initialize the UDF with the "needles" and then use the UDF later once it has
> been initialized. I have tried the following syntax:
> > >
> > >
> > >  define CONTAINS_STRINGS
> com.my.piggybank.CONTAINS_ANY('string1|string2');
> > >
> > >
> > >  and then invoking this by doing
> > >
> > >
> > >  filtered = FILTER data BY CONTAINS_STRINGS(haystack);
> > >
> > >
> > >  but this ends up re-initializing the UDF with the strings from the
> haystack which is not what I wanted.
> > >
> > >
> > >  Essentially I want to be able to write a UDF that is like the built-in
> MATCHES function so I can say something like:
> > >
> > >
> > >  filtered = FILTER data by haystack
> CONTAINS_STRINGS('string1|string2');
> > >
> > >
> > >  but so far have been unable to find any useful/relevant documentation
> on how to accomplish this.
> > >
> > >
> > >  Thanks a lot for any pointers or help anyone can give.
> > >
> > >
> > >  Best,
> > >  Zach
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>


-- 
Sheeba Ann George

Re: Writing filter function that takes constructor param?

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
As of now, udf's are limited to only String's as constructor params.


Regards,
Mridul

On Thursday 02 December 2010 02:18 PM, Sheeba George wrote:
> Hi Daniel
>    I have a related question. My UDF has a constructor that takes 2 param.
> *
>
> public* TopUDF(*int* top, *int* type){
>
> m_cnt = top;
>
> m_type = type;
>
> }
>
>
>
> But when I call instantiate using the below, I get error. Am I doing
> something wrong?
>
> define AggregateOthers com.ebay.ewa2.pig.load.TopUDF(3,0);
> Thanks
> Sheeba
>
> On Tue, Nov 30, 2010 at 1:45 PM, Zach Bailey<za...@dataclip.com>wrote:
>
>>
>>   Thanks Daniel. Of course, you are right. Turns out I had a bug elsewhere
>> in my UDF that was making me think this was not working correctly. After
>> fixing that bug the "define ..." works fine.
>>
>>
>> Using the following works great:
>>
>>
>> define INITIALIZED_UDF com.my.udfs.UDF(constructor_params)
>>
>> Thanks,
>> Zach
>>
>>
>> On Tuesday, November 30, 2010 at 4:40 PM, Daniel Dai wrote:
>>
>>> Pig always instantiate UDF using the construct parameter defined in
>>> "define" statement. ". CONTAINS_STRINGS(haystack) only pass haystack to
>>> CONTAINS_STRINGS.exec(). It will not re-initializing the UDF.
>>>
>>> Daniel
>>>
>>> Zach Bailey wrote:
>>>
>>>>   I am trying to do what seems like should be a simple task using pig
>> and a UDF I have written but can't seem to figure out the syntax to get it
>> working.
>>>>
>>>>
>>>>   At a high level I have a UDF that takes a number of strings that I
>> then want to see if exist in some other strings. So I write a UDF called
>> CONTAINS_ANY that I would like to initialize with the "needles" and then use
>> pig to distribute this search out via hadoop.
>>>>
>>>>
>>>>   The problem is I can't figure out what the correct syntax is to
>> initialize the UDF with the "needles" and then use the UDF later once it has
>> been initialized. I have tried the following syntax:
>>>>
>>>>
>>>>   define CONTAINS_STRINGS
>> com.my.piggybank.CONTAINS_ANY('string1|string2');
>>>>
>>>>
>>>>   and then invoking this by doing
>>>>
>>>>
>>>>   filtered = FILTER data BY CONTAINS_STRINGS(haystack);
>>>>
>>>>
>>>>   but this ends up re-initializing the UDF with the strings from the
>> haystack which is not what I wanted.
>>>>
>>>>
>>>>   Essentially I want to be able to write a UDF that is like the built-in
>> MATCHES function so I can say something like:
>>>>
>>>>
>>>>   filtered = FILTER data by haystack
>> CONTAINS_STRINGS('string1|string2');
>>>>
>>>>
>>>>   but so far have been unable to find any useful/relevant documentation
>> on how to accomplish this.
>>>>
>>>>
>>>>   Thanks a lot for any pointers or help anyone can give.
>>>>
>>>>
>>>>   Best,
>>>>   Zach
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>