You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dexin Wang <wa...@gmail.com> on 2011/01/12 23:51:58 UTC

wild card for all fields in a tuple

Hi,

Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.

Here's my example:

grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}

grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)

I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)

But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.

Of course this doesn't work

grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

Thanks !!

Re: wild card for all fields in a tuple

Posted by Jonathan Coveney <jc...@gmail.com>.
Foreach a generate function(thing), *; should do what yopu want. * just throws on all the columns

Sent via BlackBerry

-----Original Message-----
From: Dexin Wang <wa...@gmail.com>
Date: Wed, 12 Jan 2011 14:51:58 
To: <us...@pig.apache.org>
Reply-To: user@pig.apache.org
Subject: wild card for all fields in a tuple

Hi,

Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.

Here's my example:

grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}

grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)

I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)

But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.

Of course this doesn't work

grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

Thanks !!


Re: wild card for all fields in a tuple

Posted by Dexin Wang <wa...@gmail.com>.
Yeah, that works great. Thanks Jonathan and Alan. I can see that all fields
in between feature will be totally useful for some cases.

On Wed, Jan 12, 2011 at 3:33 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> Jonathan is right, you can do all fields in a tuple with *.  I was thinking
> of doing all fields in between two fields, which you can't do yet.
>
> Alan.
>
>
> On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:
>
>  There isn't a way to do that yet.  See
>> https://issues.apache.org/jira/browse/PIG-1693
>>  for our plans on adding it in the next release.
>>
>> Alan.
>>
>> On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:
>>
>>  Hi,
>>>
>>> Hope there is some simple answer to this. I have bunch of rows, for
>>> each
>>> row, I want to add a column which is derived from some existing
>>> columns. And
>>> I have large number of columns in my input tuple so I don't want to
>>> repeat
>>> the name using "AS" when I generate. Is there an easy way just to
>>> append a
>>> column to tuples without having to touch the tuple itself on the
>>> output.
>>>
>>> Here's my example:
>>>
>>> grunt> DESCRIBE X;
>>> X: {id: chararray,v1: int,v2: int}
>>>
>>> grunt> DUMP X;
>>> (a,3,42)
>>> (b,2,4)
>>> (c,7,32)
>>>
>>> I can do this:
>>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
>>> grunt> DUMP Y;
>>> (39,a,3,42)
>>> (2,b,2,4)
>>> (25,c,7,32)
>>>
>>> But I would prefer not to have to list all the v's. I may have v1,
>>> v2, v3,
>>> ..., v100.
>>>
>>> Of course this doesn't work
>>>
>>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>>>
>>> What can be done to simplify this? And related question, what is the
>>> schema
>>> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>>>
>>> Thanks !!
>>>
>>
>>
>

Re: wild card for all fields in a tuple

Posted by Alan Gates <ga...@yahoo-inc.com>.
Jonathan is right, you can do all fields in a tuple with *.  I was  
thinking of doing all fields in between two fields, which you can't do  
yet.

Alan.

On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:

> There isn't a way to do that yet.  See https://issues.apache.org/jira/browse/PIG-1693
>  for our plans on adding it in the next release.
>
> Alan.
>
> On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:
>
>> Hi,
>>
>> Hope there is some simple answer to this. I have bunch of rows, for
>> each
>> row, I want to add a column which is derived from some existing
>> columns. And
>> I have large number of columns in my input tuple so I don't want to
>> repeat
>> the name using "AS" when I generate. Is there an easy way just to
>> append a
>> column to tuples without having to touch the tuple itself on the
>> output.
>>
>> Here's my example:
>>
>> grunt> DESCRIBE X;
>> X: {id: chararray,v1: int,v2: int}
>>
>> grunt> DUMP X;
>> (a,3,42)
>> (b,2,4)
>> (c,7,32)
>>
>> I can do this:
>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
>> grunt> DUMP Y;
>> (39,a,3,42)
>> (2,b,2,4)
>> (25,c,7,32)
>>
>> But I would prefer not to have to list all the v's. I may have v1,
>> v2, v3,
>> ..., v100.
>>
>> Of course this doesn't work
>>
>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>>
>> What can be done to simplify this? And related question, what is the
>> schema
>> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>>
>> Thanks !!
>


Re: wild card for all fields in a tuple

Posted by Alan Gates <ga...@yahoo-inc.com>.
There isn't a way to do that yet.  See https://issues.apache.org/jira/browse/PIG-1693 
  for our plans on adding it in the next release.

Alan.

On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:

> Hi,
>
> Hope there is some simple answer to this. I have bunch of rows, for  
> each
> row, I want to add a column which is derived from some existing  
> columns. And
> I have large number of columns in my input tuple so I don't want to  
> repeat
> the name using "AS" when I generate. Is there an easy way just to  
> append a
> column to tuples without having to touch the tuple itself on the  
> output.
>
> Here's my example:
>
> grunt> DESCRIBE X;
> X: {id: chararray,v1: int,v2: int}
>
> grunt> DUMP X;
> (a,3,42)
> (b,2,4)
> (c,7,32)
>
> I can do this:
> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
> grunt> DUMP Y;
> (39,a,3,42)
> (2,b,2,4)
> (25,c,7,32)
>
> But I would prefer not to have to list all the v's. I may have v1,  
> v2, v3,
> ..., v100.
>
> Of course this doesn't work
>
> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>
> What can be done to simplify this? And related question, what is the  
> schema
> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>
> Thanks !!