You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dexin Wang <wa...@gmail.com> on 2011/01/12 23:51:58 UTC
wild card for all fields in a tuple
Hi,
Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.
Here's my example:
grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}
grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)
I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)
But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.
Of course this doesn't work
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
Thanks !!
Re: wild card for all fields in a tuple
Posted by Jonathan Coveney <jc...@gmail.com>.
Foreach a generate function(thing), *; should do what yopu want. * just throws on all the columns
Sent via BlackBerry
-----Original Message-----
From: Dexin Wang <wa...@gmail.com>
Date: Wed, 12 Jan 2011 14:51:58
To: <us...@pig.apache.org>
Reply-To: user@pig.apache.org
Subject: wild card for all fields in a tuple
Hi,
Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.
Here's my example:
grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}
grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)
I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)
But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.
Of course this doesn't work
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
Thanks !!
Re: wild card for all fields in a tuple
Posted by Dexin Wang <wa...@gmail.com>.
Yeah, that works great. Thanks Jonathan and Alan. I can see that all fields
in between feature will be totally useful for some cases.
On Wed, Jan 12, 2011 at 3:33 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> Jonathan is right, you can do all fields in a tuple with *. I was thinking
> of doing all fields in between two fields, which you can't do yet.
>
> Alan.
>
>
> On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:
>
> There isn't a way to do that yet. See
>> https://issues.apache.org/jira/browse/PIG-1693
>> for our plans on adding it in the next release.
>>
>> Alan.
>>
>> On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:
>>
>> Hi,
>>>
>>> Hope there is some simple answer to this. I have bunch of rows, for
>>> each
>>> row, I want to add a column which is derived from some existing
>>> columns. And
>>> I have large number of columns in my input tuple so I don't want to
>>> repeat
>>> the name using "AS" when I generate. Is there an easy way just to
>>> append a
>>> column to tuples without having to touch the tuple itself on the
>>> output.
>>>
>>> Here's my example:
>>>
>>> grunt> DESCRIBE X;
>>> X: {id: chararray,v1: int,v2: int}
>>>
>>> grunt> DUMP X;
>>> (a,3,42)
>>> (b,2,4)
>>> (c,7,32)
>>>
>>> I can do this:
>>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
>>> grunt> DUMP Y;
>>> (39,a,3,42)
>>> (2,b,2,4)
>>> (25,c,7,32)
>>>
>>> But I would prefer not to have to list all the v's. I may have v1,
>>> v2, v3,
>>> ..., v100.
>>>
>>> Of course this doesn't work
>>>
>>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>>>
>>> What can be done to simplify this? And related question, what is the
>>> schema
>>> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>>>
>>> Thanks !!
>>>
>>
>>
>
Re: wild card for all fields in a tuple
Posted by Alan Gates <ga...@yahoo-inc.com>.
Jonathan is right, you can do all fields in a tuple with *. I was
thinking of doing all fields in between two fields, which you can't do
yet.
Alan.
On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:
> There isn't a way to do that yet. See https://issues.apache.org/jira/browse/PIG-1693
> for our plans on adding it in the next release.
>
> Alan.
>
> On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:
>
>> Hi,
>>
>> Hope there is some simple answer to this. I have bunch of rows, for
>> each
>> row, I want to add a column which is derived from some existing
>> columns. And
>> I have large number of columns in my input tuple so I don't want to
>> repeat
>> the name using "AS" when I generate. Is there an easy way just to
>> append a
>> column to tuples without having to touch the tuple itself on the
>> output.
>>
>> Here's my example:
>>
>> grunt> DESCRIBE X;
>> X: {id: chararray,v1: int,v2: int}
>>
>> grunt> DUMP X;
>> (a,3,42)
>> (b,2,4)
>> (c,7,32)
>>
>> I can do this:
>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
>> grunt> DUMP Y;
>> (39,a,3,42)
>> (2,b,2,4)
>> (25,c,7,32)
>>
>> But I would prefer not to have to list all the v's. I may have v1,
>> v2, v3,
>> ..., v100.
>>
>> Of course this doesn't work
>>
>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>>
>> What can be done to simplify this? And related question, what is the
>> schema
>> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>>
>> Thanks !!
>
Re: wild card for all fields in a tuple
Posted by Alan Gates <ga...@yahoo-inc.com>.
There isn't a way to do that yet. See https://issues.apache.org/jira/browse/PIG-1693
for our plans on adding it in the next release.
Alan.
On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:
> Hi,
>
> Hope there is some simple answer to this. I have bunch of rows, for
> each
> row, I want to add a column which is derived from some existing
> columns. And
> I have large number of columns in my input tuple so I don't want to
> repeat
> the name using "AS" when I generate. Is there an easy way just to
> append a
> column to tuples without having to touch the tuple itself on the
> output.
>
> Here's my example:
>
> grunt> DESCRIBE X;
> X: {id: chararray,v1: int,v2: int}
>
> grunt> DUMP X;
> (a,3,42)
> (b,2,4)
> (c,7,32)
>
> I can do this:
> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
> grunt> DUMP Y;
> (39,a,3,42)
> (2,b,2,4)
> (25,c,7,32)
>
> But I would prefer not to have to list all the v's. I may have v1,
> v2, v3,
> ..., v100.
>
> Of course this doesn't work
>
> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>
> What can be done to simplify this? And related question, what is the
> schema
> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>
> Thanks !!