You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Lai Will <la...@student.ethz.ch> on 2011/04/28 18:05:34 UTC

Group by, without having key in old relation

Hi there

Let's say I have

DUMP A
(user1, date1, {(item1), (item2)}, {(skill1), (skill2)})
(user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)})
(user2, date2, {(item2), (item5)}, {(skill4})

When I do 

B = GROUP A by user

I get

user1 {(user1, date1, {(item1), (item2)}, {(skill1), (skill2)}), (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) }
user2 {(user2, date2, {(item2), (item5)}, {(skill4})}

this is actually fine, but better would be to have the previous relation without the group by key contained:

user1 {(date1, {(item1), (item2)} , {(skill1), (skill2)}), (date2, {(item3), (item4), (item5)}, {(skill1), (skill3)})}
user2 {(date2, {(item2), (item5)} , {(skill4})}

what's the easiest way to achieve this?

Best,
Will

RE: Group by, without having key in old relation

Posted by Lai Will <la...@student.ethz.ch>.

Oh I was not aware of that notation.
Just saw that its documented as tuple dereference.

Thanks!

Best,
Will

-----Original Message-----
From: Jonathan Coveney [mailto:jcoveney@gmail.com] 
Sent: Donnerstag, 28. April 2011 20:38
To: user@pig.apache.org
Subject: Re: Group by, without having key in old relation

I believe this should work

B= GROUP A BY user;
C = FOREACH B GENERATE group,B.(datecol,itemscol,skillcol);

2011/4/28 Lai Will <la...@student.ethz.ch>

> Hi there
>
> Let's say I have
>
> DUMP A
> (user1, date1, {(item1), (item2)}, {(skill1), (skill2)}) (user1, 
> date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) (user2, 
> date2, {(item2), (item5)}, {(skill4})
>
> When I do
>
> B = GROUP A by user
>
> I get
>
> user1 {(user1, date1, {(item1), (item2)}, {(skill1), (skill2)}), 
> (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) }
> user2 {(user2, date2, {(item2), (item5)}, {(skill4})}
>
> this is actually fine, but better would be to have the previous 
> relation without the group by key contained:
>
> user1 {(date1, {(item1), (item2)} , {(skill1), (skill2)}), (date2, 
> {(item3), (item4), (item5)}, {(skill1), (skill3)})}
> user2 {(date2, {(item2), (item5)} , {(skill4})}
>
> what's the easiest way to achieve this?
>
> Best,
> Will
>
>
>
>
>
>
>

Re: Group by, without having key in old relation

Posted by Jonathan Coveney <jc...@gmail.com>.

I believe this should work

B= GROUP A BY user;
C = FOREACH B GENERATE group,B.(datecol,itemscol,skillcol);

2011/4/28 Lai Will <la...@student.ethz.ch>

> Hi there
>
> Let's say I have
>
> DUMP A
> (user1, date1, {(item1), (item2)}, {(skill1), (skill2)})
> (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)})
> (user2, date2, {(item2), (item5)}, {(skill4})
>
> When I do
>
> B = GROUP A by user
>
> I get
>
> user1 {(user1, date1, {(item1), (item2)}, {(skill1), (skill2)}), (user1,
> date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) }
> user2 {(user2, date2, {(item2), (item5)}, {(skill4})}
>
> this is actually fine, but better would be to have the previous relation
> without the group by key contained:
>
> user1 {(date1, {(item1), (item2)} , {(skill1), (skill2)}), (date2,
> {(item3), (item4), (item5)}, {(skill1), (skill3)})}
> user2 {(date2, {(item2), (item5)} , {(skill4})}
>
> what's the easiest way to achieve this?
>
> Best,
> Will
>
>
>
>
>
>
>