You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cam Bazz <ca...@gmail.com> on 2011/01/30 12:10:58 UTC

join with sum

Hello,

Lets say I have two tables like

A:
1,11
2,15

and

B:
1,10
4,11
5,10

and joinin them

J = JOIN A by $0 FULL, B by $0

I get J:

1,11,1,21
2,16,,
,,4,11
,,5,10

which is a full outer join:

what I need is:

1,31
2,16
4,11
5,10

so I want to join by key, and then sum some fields.

the regular sum operator is for group or agregate summing and can not do it.

any ideas / recomendation / help greatly appreciated.

best.
c.b.

Re: join with sum

Posted by Jacob Perkins <ja...@gmail.com>.
I still believe you want the COGROUP operator:


A = LOAD '$A' AS (field_1:int, field_2:int);
DUMP A;

(1,11)
(2,15)

B = LOAD '$B' AS (field_1:int, field_2:int);
DUMP B;

(1,10)
(4,11)
(5,10)

grouped = COGROUP A BY field_1, B BY field_1;
DUMP grouped;

(1,{(1,11)},{(1,10)})
(2,{(2,15)},{})
(4,{},{(4,11)})
(5,{},{(5,10)})

summed = FOREACH grouped {
           left_sum     = SUM(A.field_2);
           right_sum    = SUM(B.field_2);
           summed_fields = (left_sum IS NOT NULL ? left_sum : 0) +
(right_sum IS NOT NULL ? right_sum : 0);
           GENERATE
             FLATTEN(group) AS key,
             summed_fields
           ;
         };
DUMP summed;

(1,21L)
(2,15L)
(4,11L)
(5,10L)


It's messy because you need to account for when the cogroup misses. I
don't think it's exactly what your looking for, but then again it's not
entirely clear. This should at least get you on the right track.

--jacob
@thedatachef



On Sun, 2011-01-30 at 16:43 +0200, Cam Bazz wrote:
> Hello,
> 
> I did look over to the GROUP operator, but could not find what i was
> looking for.
> 
> I need to sum the values given by keys. This is because I want to do
> incremental processing.
> 
> best.
> 
> On Sun, Jan 30, 2011 at 4:01 PM, Jacob Perkins
> <ja...@gmail.com> wrote:
> > See the COGROUP operator
> >
> > --jacob
> > @thedatachef
> >
> > Sent from my iPhone
> >
> > On Jan 30, 2011, at 5:10 AM, Cam Bazz <ca...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> Lets say I have two tables like
> >>
> >> A:
> >> 1,11
> >> 2,15
> >>
> >> and
> >>
> >> B:
> >> 1,10
> >> 4,11
> >> 5,10
> >>
> >> and joinin them
> >>
> >> J = JOIN A by $0 FULL, B by $0
> >>
> >> I get J:
> >>
> >> 1,11,1,21
> >> 2,16,,
> >> ,,4,11
> >> ,,5,10
> >>
> >> which is a full outer join:
> >>
> >> what I need is:
> >>
> >> 1,31
> >> 2,16
> >> 4,11
> >> 5,10
> >>
> >> so I want to join by key, and then sum some fields.
> >>
> >> the regular sum operator is for group or agregate summing and can not do
> >> it.
> >>
> >> any ideas / recomendation / help greatly appreciated.
> >>
> >> best.
> >> c.b.
> >



Re: join with sum

Posted by Cam Bazz <ca...@gmail.com>.
Hello,

I did look over to the GROUP operator, but could not find what i was
looking for.

I need to sum the values given by keys. This is because I want to do
incremental processing.

best.

On Sun, Jan 30, 2011 at 4:01 PM, Jacob Perkins
<ja...@gmail.com> wrote:
> See the COGROUP operator
>
> --jacob
> @thedatachef
>
> Sent from my iPhone
>
> On Jan 30, 2011, at 5:10 AM, Cam Bazz <ca...@gmail.com> wrote:
>
>> Hello,
>>
>> Lets say I have two tables like
>>
>> A:
>> 1,11
>> 2,15
>>
>> and
>>
>> B:
>> 1,10
>> 4,11
>> 5,10
>>
>> and joinin them
>>
>> J = JOIN A by $0 FULL, B by $0
>>
>> I get J:
>>
>> 1,11,1,21
>> 2,16,,
>> ,,4,11
>> ,,5,10
>>
>> which is a full outer join:
>>
>> what I need is:
>>
>> 1,31
>> 2,16
>> 4,11
>> 5,10
>>
>> so I want to join by key, and then sum some fields.
>>
>> the regular sum operator is for group or agregate summing and can not do
>> it.
>>
>> any ideas / recomendation / help greatly appreciated.
>>
>> best.
>> c.b.
>

Re: join with sum

Posted by Jacob Perkins <ja...@gmail.com>.
See the COGROUP operator

--jacob
@thedatachef

Sent from my iPhone

On Jan 30, 2011, at 5:10 AM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> Lets say I have two tables like
>
> A:
> 1,11
> 2,15
>
> and
>
> B:
> 1,10
> 4,11
> 5,10
>
> and joinin them
>
> J = JOIN A by $0 FULL, B by $0
>
> I get J:
>
> 1,11,1,21
> 2,16,,
> ,,4,11
> ,,5,10
>
> which is a full outer join:
>
> what I need is:
>
> 1,31
> 2,16
> 4,11
> 5,10
>
> so I want to join by key, and then sum some fields.
>
> the regular sum operator is for group or agregate summing and can  
> not do it.
>
> any ideas / recomendation / help greatly appreciated.
>
> best.
> c.b.