You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by de...@wipro.com on 2011/01/07 19:31:22 UTC

Error 1000: UDF Python

Hi,

I have a python UDF, used by a PIG Script.

I get a parsing error for some reason.

------------

REGISTER '/path/to/udf.py' USING jython AS udf;

 records = LOAD 'path/to/data' AS (input_line:chararray);

 schema_records = FOREACH records GENERATE udf.split_into_words(input_line);

projected_records = FOREACH schema_records GENERATE field1, field2;

 DUMP schema_records;

 ----------

 Here's the python udf:

@outputSchema("t:(field1:chararray, field1:chararray)")

def  split_into_words(input_line):

    line = input_line.strip()

    words = line.split()

    return (words[0], words[1])

--------------

The error I get is:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: field1 in {t: (field1: chararray, field2: chararray)

What am I doing wrong?

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

RE: Error 1000: UDF Python

Posted by de...@wipro.com.

Thanks! That worked! 

-----Original Message-----
From: Jonathan Coveney [mailto:jcoveney@gmail.com] 
Sent: Saturday, January 08, 2011 12:13 AM
To: user@pig.apache.org
Subject: Re: Error 1000: UDF Python

It also looks like you can just refer to the touple pieces. So you have two
options:

schema_records = FOREACH records GENERATE FLATTEN(udf.split_into_words(input_line));

OR

projected_records = FOREACH schema_records GENERATE t.field1, t.field2;


where t is the name of the tuple in the schema of your python UDF

2011/1/7 Jonathan Coveney <jc...@gmail.com>

> the result of your UDF is a tuple, so field1 and field2 don't exist. 
> try doing GENERATE FLATTEN(udf.etc); and then do a DESCRIBE on 
> schema_records to see what the columns are called.
>
> 2011/1/7 <de...@wipro.com>
>
> Hi,
>>
>> I have a python UDF, used by a PIG Script.
>>
>> I get a parsing error for some reason.
>>
>> ------------
>>
>> REGISTER '/path/to/udf.py' USING jython AS udf;
>>
>>  records = LOAD 'path/to/data' AS (input_line:chararray);
>>
>>  schema_records = FOREACH records GENERATE 
>> udf.split_into_words(input_line);
>>
>> projected_records = FOREACH schema_records GENERATE field1, field2;
>>
>>  DUMP schema_records;
>>
>>  ----------
>>
>>  Here's the python udf:
>>
>> @outputSchema("t:(field1:chararray, field1:chararray)")
>>
>> def  split_into_words(input_line):
>>
>>    line = input_line.strip()
>>
>>    words = line.split()
>>
>>    return (words[0], words[1])
>>
>> --------------
>>
>> The error I get is:
>>
>> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error 
>> during parsing. Invalid alias: field1 in {t: (field1: chararray, 
>> field2: chararray)
>>
>> What am I doing wrong?
>>
>> Please do not print this email unless it is absolutely necessary.
>>
>> The information contained in this electronic message and any 
>> attachments to this message are intended for the exclusive use of the 
>> addressee(s) and may contain proprietary, confidential or privileged 
>> information. If you are not the intended recipient, you should not 
>> disseminate, distribute or copy this e-mail. Please notify the sender 
>> immediately and destroy all copies of this message and any attachments.
>>
>> WARNING: Computer viruses can be transmitted via email. The recipient 
>> should check this email and any attachments for the presence of 
>> viruses. The company accepts no liability for any damage caused by 
>> any virus transmitted by this email.
>>
>> www.wipro.com
>>
>
>

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

Re: Error 1000: UDF Python

Posted by Jonathan Coveney <jc...@gmail.com>.

It also looks like you can just refer to the touple pieces. So you have two
options:

schema_records = FOREACH records GENERATE
FLATTEN(udf.split_into_words(input_line));

OR

projected_records = FOREACH schema_records GENERATE t.field1, t.field2;


where t is the name of the tuple in the schema of your python UDF

2011/1/7 Jonathan Coveney <jc...@gmail.com>

> the result of your UDF is a tuple, so field1 and field2 don't exist. try
> doing GENERATE FLATTEN(udf.etc); and then do a DESCRIBE on schema_records to
> see what the columns are called.
>
> 2011/1/7 <de...@wipro.com>
>
> Hi,
>>
>> I have a python UDF, used by a PIG Script.
>>
>> I get a parsing error for some reason.
>>
>> ------------
>>
>> REGISTER '/path/to/udf.py' USING jython AS udf;
>>
>>  records = LOAD 'path/to/data' AS (input_line:chararray);
>>
>>  schema_records = FOREACH records GENERATE
>> udf.split_into_words(input_line);
>>
>> projected_records = FOREACH schema_records GENERATE field1, field2;
>>
>>  DUMP schema_records;
>>
>>  ----------
>>
>>  Here's the python udf:
>>
>> @outputSchema("t:(field1:chararray, field1:chararray)")
>>
>> def  split_into_words(input_line):
>>
>>    line = input_line.strip()
>>
>>    words = line.split()
>>
>>    return (words[0], words[1])
>>
>> --------------
>>
>> The error I get is:
>>
>> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
>> parsing. Invalid alias: field1 in {t: (field1: chararray, field2: chararray)
>>
>> What am I doing wrong?
>>
>> Please do not print this email unless it is absolutely necessary.
>>
>> The information contained in this electronic message and any attachments
>> to this message are intended for the exclusive use of the addressee(s) and
>> may contain proprietary, confidential or privileged information. If you are
>> not the intended recipient, you should not disseminate, distribute or copy
>> this e-mail. Please notify the sender immediately and destroy all copies of
>> this message and any attachments.
>>
>> WARNING: Computer viruses can be transmitted via email. The recipient
>> should check this email and any attachments for the presence of viruses. The
>> company accepts no liability for any damage caused by any virus transmitted
>> by this email.
>>
>> www.wipro.com
>>
>
>

Re: Error 1000: UDF Python

Posted by Jonathan Coveney <jc...@gmail.com>.

the result of your UDF is a tuple, so field1 and field2 don't exist. try
doing GENERATE FLATTEN(udf.etc); and then do a DESCRIBE on schema_records to
see what the columns are called.

2011/1/7 <de...@wipro.com>

> Hi,
>
> I have a python UDF, used by a PIG Script.
>
> I get a parsing error for some reason.
>
> ------------
>
> REGISTER '/path/to/udf.py' USING jython AS udf;
>
>  records = LOAD 'path/to/data' AS (input_line:chararray);
>
>  schema_records = FOREACH records GENERATE
> udf.split_into_words(input_line);
>
> projected_records = FOREACH schema_records GENERATE field1, field2;
>
>  DUMP schema_records;
>
>  ----------
>
>  Here's the python udf:
>
> @outputSchema("t:(field1:chararray, field1:chararray)")
>
> def  split_into_words(input_line):
>
>    line = input_line.strip()
>
>    words = line.split()
>
>    return (words[0], words[1])
>
> --------------
>
> The error I get is:
>
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
> parsing. Invalid alias: field1 in {t: (field1: chararray, field2: chararray)
>
> What am I doing wrong?
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses. The
> company accepts no liability for any damage caused by any virus transmitted
> by this email.
>
> www.wipro.com
>