You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Lai Will <la...@student.ethz.ch> on 2011/04/28 18:05:34 UTC
Group by, without having key in old relation
Hi there
Let's say I have
DUMP A
(user1, date1, {(item1), (item2)}, {(skill1), (skill2)})
(user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)})
(user2, date2, {(item2), (item5)}, {(skill4})
When I do
B = GROUP A by user
I get
user1 {(user1, date1, {(item1), (item2)}, {(skill1), (skill2)}), (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) }
user2 {(user2, date2, {(item2), (item5)}, {(skill4})}
this is actually fine, but better would be to have the previous relation without the group by key contained:
user1 {(date1, {(item1), (item2)} , {(skill1), (skill2)}), (date2, {(item3), (item4), (item5)}, {(skill1), (skill3)})}
user2 {(date2, {(item2), (item5)} , {(skill4})}
what's the easiest way to achieve this?
Best,
Will
RE: Group by, without having key in old relation
Posted by Lai Will <la...@student.ethz.ch>.
Oh I was not aware of that notation.
Just saw that its documented as tuple dereference.
Thanks!
Best,
Will
-----Original Message-----
From: Jonathan Coveney [mailto:jcoveney@gmail.com]
Sent: Donnerstag, 28. April 2011 20:38
To: user@pig.apache.org
Subject: Re: Group by, without having key in old relation
I believe this should work
B= GROUP A BY user;
C = FOREACH B GENERATE group,B.(datecol,itemscol,skillcol);
2011/4/28 Lai Will <la...@student.ethz.ch>
> Hi there
>
> Let's say I have
>
> DUMP A
> (user1, date1, {(item1), (item2)}, {(skill1), (skill2)}) (user1,
> date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) (user2,
> date2, {(item2), (item5)}, {(skill4})
>
> When I do
>
> B = GROUP A by user
>
> I get
>
> user1 {(user1, date1, {(item1), (item2)}, {(skill1), (skill2)}),
> (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) }
> user2 {(user2, date2, {(item2), (item5)}, {(skill4})}
>
> this is actually fine, but better would be to have the previous
> relation without the group by key contained:
>
> user1 {(date1, {(item1), (item2)} , {(skill1), (skill2)}), (date2,
> {(item3), (item4), (item5)}, {(skill1), (skill3)})}
> user2 {(date2, {(item2), (item5)} , {(skill4})}
>
> what's the easiest way to achieve this?
>
> Best,
> Will
>
>
>
>
>
>
>
Re: Group by, without having key in old relation
Posted by Jonathan Coveney <jc...@gmail.com>.
I believe this should work
B= GROUP A BY user;
C = FOREACH B GENERATE group,B.(datecol,itemscol,skillcol);
2011/4/28 Lai Will <la...@student.ethz.ch>
> Hi there
>
> Let's say I have
>
> DUMP A
> (user1, date1, {(item1), (item2)}, {(skill1), (skill2)})
> (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)})
> (user2, date2, {(item2), (item5)}, {(skill4})
>
> When I do
>
> B = GROUP A by user
>
> I get
>
> user1 {(user1, date1, {(item1), (item2)}, {(skill1), (skill2)}), (user1,
> date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) }
> user2 {(user2, date2, {(item2), (item5)}, {(skill4})}
>
> this is actually fine, but better would be to have the previous relation
> without the group by key contained:
>
> user1 {(date1, {(item1), (item2)} , {(skill1), (skill2)}), (date2,
> {(item3), (item4), (item5)}, {(skill1), (skill3)})}
> user2 {(date2, {(item2), (item5)} , {(skill4})}
>
> what's the easiest way to achieve this?
>
> Best,
> Will
>
>
>
>
>
>
>