You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Marco Cova <ma...@gmail.com> on 2012/03/16 00:03:03 UTC
python filter udfs
Hi all.
I'm trying to write a simple filter function (to be used with the FILTER operator) in python, but I don't seem to find the right way to specify its schema. I'm using pig 0.9.2.
The filter's code is (trivially):
def trivial_filter(s):
return True
What's the right way of annotating it so that pig understands it returns a boolean?
I've tried with:
- @outputSchema("b:boolean") but this causes :
ERROR 1200: <line 1, column 2> Syntax error, unexpected symbol at or near 'boolean
- @outputSchema("b:int") is also rejected (as expected):
ERROR 1058:
<file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must evaluate to boolean. Found: int
Thanks,
Marco
Re: python filter udfs
Posted by Marco Cova <ma...@gmail.com>.
Jonathan,
Thanks: this will do.
Marco
On Mar 15, 2012, at 5:34 PM, Jonathan Coveney wrote:
> I don't know if you can do a filterfunc per se, but a hack would be to
> return an int, and do 1 if true and 0 otherwise, and filter by
> yourudf(input)==1
>
> 2012/3/15 Marco Cova <ma...@gmail.com>
>
>> Hi all.
>>
>> I'm trying to write a simple filter function (to be used with the FILTER
>> operator) in python, but I don't seem to find the right way to specify its
>> schema. I'm using pig 0.9.2.
>>
>> The filter's code is (trivially):
>> def trivial_filter(s):
>> return True
>> What's the right way of annotating it so that pig understands it returns a
>> boolean?
>>
>> I've tried with:
>> - @outputSchema("b:boolean") but this causes :
>> ERROR 1200: <line 1, column 2> Syntax error, unexpected symbol at or near
>> 'boolean
>> - @outputSchema("b:int") is also rejected (as expected):
>> ERROR 1058:
>> <file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must
>> evaluate to boolean. Found: int
>>
>> Thanks,
>> Marco
>>
>>
>>
Re: python filter udfs
Posted by Jonathan Coveney <jc...@gmail.com>.
I don't know if you can do a filterfunc per se, but a hack would be to
return an int, and do 1 if true and 0 otherwise, and filter by
yourudf(input)==1
2012/3/15 Marco Cova <ma...@gmail.com>
> Hi all.
>
> I'm trying to write a simple filter function (to be used with the FILTER
> operator) in python, but I don't seem to find the right way to specify its
> schema. I'm using pig 0.9.2.
>
> The filter's code is (trivially):
> def trivial_filter(s):
> return True
> What's the right way of annotating it so that pig understands it returns a
> boolean?
>
> I've tried with:
> - @outputSchema("b:boolean") but this causes :
> ERROR 1200: <line 1, column 2> Syntax error, unexpected symbol at or near
> 'boolean
> - @outputSchema("b:int") is also rejected (as expected):
> ERROR 1058:
> <file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must
> evaluate to boolean. Found: int
>
> Thanks,
> Marco
>
>
>