You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Marco Cova <ma...@gmail.com> on 2012/03/16 00:03:03 UTC

python filter udfs

Hi all.

I'm trying to write a simple filter function (to be used with the FILTER operator) in python, but I don't seem to find the right way to specify its schema. I'm using pig 0.9.2.

The filter's code is (trivially):
	def trivial_filter(s):
		return True
What's the right way of annotating it so that pig understands it returns a boolean?

I've tried with:
- @outputSchema("b:boolean") but this causes :
ERROR 1200: <line 1, column 2>  Syntax error, unexpected symbol at or near 'boolean
- @outputSchema("b:int") is also rejected (as expected):
ERROR 1058: 
<file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must evaluate to boolean. Found: int

Thanks,
Marco



Re: python filter udfs

Posted by Marco Cova <ma...@gmail.com>.
Jonathan,

Thanks: this will do.

Marco

On Mar 15, 2012, at 5:34 PM, Jonathan Coveney wrote:

> I don't know if you can do a filterfunc per se, but a hack would be to
> return an int, and do 1 if true and 0 otherwise, and filter by
> yourudf(input)==1
> 
> 2012/3/15 Marco Cova <ma...@gmail.com>
> 
>> Hi all.
>> 
>> I'm trying to write a simple filter function (to be used with the FILTER
>> operator) in python, but I don't seem to find the right way to specify its
>> schema. I'm using pig 0.9.2.
>> 
>> The filter's code is (trivially):
>>       def trivial_filter(s):
>>               return True
>> What's the right way of annotating it so that pig understands it returns a
>> boolean?
>> 
>> I've tried with:
>> - @outputSchema("b:boolean") but this causes :
>> ERROR 1200: <line 1, column 2>  Syntax error, unexpected symbol at or near
>> 'boolean
>> - @outputSchema("b:int") is also rejected (as expected):
>> ERROR 1058:
>> <file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must
>> evaluate to boolean. Found: int
>> 
>> Thanks,
>> Marco
>> 
>> 
>> 


Re: python filter udfs

Posted by Jonathan Coveney <jc...@gmail.com>.
I don't know if you can do a filterfunc per se, but a hack would be to
return an int, and do 1 if true and 0 otherwise, and filter by
yourudf(input)==1

2012/3/15 Marco Cova <ma...@gmail.com>

> Hi all.
>
> I'm trying to write a simple filter function (to be used with the FILTER
> operator) in python, but I don't seem to find the right way to specify its
> schema. I'm using pig 0.9.2.
>
> The filter's code is (trivially):
>        def trivial_filter(s):
>                return True
> What's the right way of annotating it so that pig understands it returns a
> boolean?
>
> I've tried with:
> - @outputSchema("b:boolean") but this causes :
> ERROR 1200: <line 1, column 2>  Syntax error, unexpected symbol at or near
> 'boolean
> - @outputSchema("b:int") is also rejected (as expected):
> ERROR 1058:
> <file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must
> evaluate to boolean. Found: int
>
> Thanks,
> Marco
>
>
>