You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cam Bazz <ca...@gmail.com> on 2011/01/30 12:10:58 UTC
join with sum
Hello,
Lets say I have two tables like
A:
1,11
2,15
and
B:
1,10
4,11
5,10
and joinin them
J = JOIN A by $0 FULL, B by $0
I get J:
1,11,1,21
2,16,,
,,4,11
,,5,10
which is a full outer join:
what I need is:
1,31
2,16
4,11
5,10
so I want to join by key, and then sum some fields.
the regular sum operator is for group or agregate summing and can not do it.
any ideas / recomendation / help greatly appreciated.
best.
c.b.
Re: join with sum
Posted by Jacob Perkins <ja...@gmail.com>.
I still believe you want the COGROUP operator:
A = LOAD '$A' AS (field_1:int, field_2:int);
DUMP A;
(1,11)
(2,15)
B = LOAD '$B' AS (field_1:int, field_2:int);
DUMP B;
(1,10)
(4,11)
(5,10)
grouped = COGROUP A BY field_1, B BY field_1;
DUMP grouped;
(1,{(1,11)},{(1,10)})
(2,{(2,15)},{})
(4,{},{(4,11)})
(5,{},{(5,10)})
summed = FOREACH grouped {
left_sum = SUM(A.field_2);
right_sum = SUM(B.field_2);
summed_fields = (left_sum IS NOT NULL ? left_sum : 0) +
(right_sum IS NOT NULL ? right_sum : 0);
GENERATE
FLATTEN(group) AS key,
summed_fields
;
};
DUMP summed;
(1,21L)
(2,15L)
(4,11L)
(5,10L)
It's messy because you need to account for when the cogroup misses. I
don't think it's exactly what your looking for, but then again it's not
entirely clear. This should at least get you on the right track.
--jacob
@thedatachef
On Sun, 2011-01-30 at 16:43 +0200, Cam Bazz wrote:
> Hello,
>
> I did look over to the GROUP operator, but could not find what i was
> looking for.
>
> I need to sum the values given by keys. This is because I want to do
> incremental processing.
>
> best.
>
> On Sun, Jan 30, 2011 at 4:01 PM, Jacob Perkins
> <ja...@gmail.com> wrote:
> > See the COGROUP operator
> >
> > --jacob
> > @thedatachef
> >
> > Sent from my iPhone
> >
> > On Jan 30, 2011, at 5:10 AM, Cam Bazz <ca...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> Lets say I have two tables like
> >>
> >> A:
> >> 1,11
> >> 2,15
> >>
> >> and
> >>
> >> B:
> >> 1,10
> >> 4,11
> >> 5,10
> >>
> >> and joinin them
> >>
> >> J = JOIN A by $0 FULL, B by $0
> >>
> >> I get J:
> >>
> >> 1,11,1,21
> >> 2,16,,
> >> ,,4,11
> >> ,,5,10
> >>
> >> which is a full outer join:
> >>
> >> what I need is:
> >>
> >> 1,31
> >> 2,16
> >> 4,11
> >> 5,10
> >>
> >> so I want to join by key, and then sum some fields.
> >>
> >> the regular sum operator is for group or agregate summing and can not do
> >> it.
> >>
> >> any ideas / recomendation / help greatly appreciated.
> >>
> >> best.
> >> c.b.
> >
Re: join with sum
Posted by Cam Bazz <ca...@gmail.com>.
Hello,
I did look over to the GROUP operator, but could not find what i was
looking for.
I need to sum the values given by keys. This is because I want to do
incremental processing.
best.
On Sun, Jan 30, 2011 at 4:01 PM, Jacob Perkins
<ja...@gmail.com> wrote:
> See the COGROUP operator
>
> --jacob
> @thedatachef
>
> Sent from my iPhone
>
> On Jan 30, 2011, at 5:10 AM, Cam Bazz <ca...@gmail.com> wrote:
>
>> Hello,
>>
>> Lets say I have two tables like
>>
>> A:
>> 1,11
>> 2,15
>>
>> and
>>
>> B:
>> 1,10
>> 4,11
>> 5,10
>>
>> and joinin them
>>
>> J = JOIN A by $0 FULL, B by $0
>>
>> I get J:
>>
>> 1,11,1,21
>> 2,16,,
>> ,,4,11
>> ,,5,10
>>
>> which is a full outer join:
>>
>> what I need is:
>>
>> 1,31
>> 2,16
>> 4,11
>> 5,10
>>
>> so I want to join by key, and then sum some fields.
>>
>> the regular sum operator is for group or agregate summing and can not do
>> it.
>>
>> any ideas / recomendation / help greatly appreciated.
>>
>> best.
>> c.b.
>
Re: join with sum
Posted by Jacob Perkins <ja...@gmail.com>.
See the COGROUP operator
--jacob
@thedatachef
Sent from my iPhone
On Jan 30, 2011, at 5:10 AM, Cam Bazz <ca...@gmail.com> wrote:
> Hello,
>
> Lets say I have two tables like
>
> A:
> 1,11
> 2,15
>
> and
>
> B:
> 1,10
> 4,11
> 5,10
>
> and joinin them
>
> J = JOIN A by $0 FULL, B by $0
>
> I get J:
>
> 1,11,1,21
> 2,16,,
> ,,4,11
> ,,5,10
>
> which is a full outer join:
>
> what I need is:
>
> 1,31
> 2,16
> 4,11
> 5,10
>
> so I want to join by key, and then sum some fields.
>
> the regular sum operator is for group or agregate summing and can
> not do it.
>
> any ideas / recomendation / help greatly appreciated.
>
> best.
> c.b.