You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Naber, Chad" <CN...@edmunds.com> on 2009/07/16 19:45:39 UTC

Problem with UDF returning tuple

Hello,

I am having a problem with PIG storing results from a user defined function.  This function takes a chararray, and creates a tuple of 2 values out of it.  When this tuple is passed back into pig is being stored as a single field instead of a tuple.  Have any of you dealt with storing tuples from a UDF?  

Here is my pig code:

REGISTER pigudfs.jar;

EDW = LOAD '/apache-logs/20080707/edw.log' USING PigStorage() AS (t1:chararray);

CHAD = SAMPLE EDW 0.01;

VISITORTIME = FOREACH CHAD GENERATE pigudfs.visitorTimeParse(t1);

DESCRIBE VISITORTIME;

DUMP VISITORTIME


Here is what it returns:

VISITORTIME: {(null)}

((123,45345))


Regards,
Chad


Re: Problem with UDF returning tuple

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.
Chad,
The behavior is consistent with Generate semantics.

Try
VISITORTIME = FOREACH CHAD GENERATE FLATTEN pigudfs.visitorTimeParse(t1);

On Thu, Jul 16, 2009 at 10:45 AM, Naber, Chad<CN...@edmunds.com> wrote:
> Hello,
>
> I am having a problem with PIG storing results from a user defined function.  This function takes a chararray, and creates a tuple of 2 values out of it.  When this tuple is passed back into pig is being stored as a single field instead of a tuple.  Have any of you dealt with storing tuples from a UDF?
>
> Here is my pig code:
>
> REGISTER pigudfs.jar;
>
> EDW = LOAD '/apache-logs/20080707/edw.log' USING PigStorage() AS (t1:chararray);
>
> CHAD = SAMPLE EDW 0.01;
>
> VISITORTIME = FOREACH CHAD GENERATE pigudfs.visitorTimeParse(t1);
>
> DESCRIBE VISITORTIME;
>
> DUMP VISITORTIME
>
>
> Here is what it returns:
>
> VISITORTIME: {(null)}
>
> ((123,45345))
>
>
> Regards,
> Chad
>
>

RE: Problem with UDF returning tuple

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
If you want the values inside the tuple to be present as individual
fields in the output you should flatten() the output of the udf:

VISITORTIME = FOREACH CHAD GENERATE
flatten(pigudfs.visitorTimeParse(t1));

-Pradeep

-----Original Message-----
From: Naber, Chad [mailto:CNaber@edmunds.com] 
Sent: Thursday, July 16, 2009 10:46 AM
To: pig-user@hadoop.apache.org
Subject: Problem with UDF returning tuple

Hello,

I am having a problem with PIG storing results from a user defined
function.  This function takes a chararray, and creates a tuple of 2
values out of it.  When this tuple is passed back into pig is being
stored as a single field instead of a tuple.  Have any of you dealt with
storing tuples from a UDF?  

Here is my pig code:

REGISTER pigudfs.jar;

EDW = LOAD '/apache-logs/20080707/edw.log' USING PigStorage() AS
(t1:chararray);

CHAD = SAMPLE EDW 0.01;

VISITORTIME = FOREACH CHAD GENERATE pigudfs.visitorTimeParse(t1);

DESCRIBE VISITORTIME;

DUMP VISITORTIME


Here is what it returns:

VISITORTIME: {(null)}

((123,45345))


Regards,
Chad