You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cagri Balkesen <ca...@yahoo.com> on 2009/07/23 19:21:13 UTC

Second level of aggregation in foreach

Dear all,

I have some data in the following format:

//test
a       {(1,{(a,1),(a,2),(a,7)}),(2,{(a,20),(a,15),(a,12)}),(3,{(a,9),(a,7),(a,8)})}                                                                  
b       {(1,{(b,3),(b,4),(b,8)}),(2,{(b,14),(b,16),(b,19)}),(3,{(b,17),(b,16),(b,14)}),(4,{(b,8),(b,4),(b,3)})}
c       {(1,{(c,5),(c,6),(c,10)}),(2,{(c,13),(c,17),(c,18)}),(3,{(c,18),(c,17),(c,13)}),(4,{(c,11),(c,10),(c,6)})}

I load as follows:
group2 = LOAD 'test' AS (grpkey:chararray, list: {  item:(id, tuples: {  mytuple:tuple( key:chararray, value:int ) } ) } );

Then I would like to compute average in each group for each list. So my data is nested grouped. (Btw I couldn't generate such a data since GROUP BY is not allowed inside FOREACH)

Result must be something like follows:
//groupkey { ( id, AVG(tuples) )}
a {(1, 3.33), (2, 15.66), (3, 8.0)}
b {(1, 5.0), (2, 16.33), (3, 15.66), (4, 5.0)}
c {(1, 7.0), (2, 16.0), (3, 16.0), (4, 9.0)}

//Or like follows:
//groupkey { AVG(tuples) }
a {(3.33), (15.66), (8.0)}
b {(5.0), (16.33), (15.66), (5.0)}
c {(7.0), (16.0), (16.0), (9.0)}

Is it possible to do this in one FOREACH? Any suggestion is welcome.

-Cagri