You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Scott <sk...@weather.com> on 2010/01/14 22:21:01 UTC

Basic 'SUM' question

Hello fellow Pig users.  I am brand new to Pig/hadoop, and am having 
trouble with something that I am guessing is very basic.  I have a 
relation where I did a group by several values, then counted the 
groups.  Here is a description of the relation:

count_grouped: {g1: (site: chararray,tf: chararray,pos: 
chararray,country: chararray,state: chararray,dma: chararray),imp_count: 
long}


Now I would like to calculate the sum of the imp_count value to get a 
grand total.  The docs indicate I need to do a 'GROUP ALL' to compute a 
global sum, but I can't seem to  make that work.  Any help would be 
appreciated.


Thanks,
Scott


Re: Basic 'SUM' question

Posted by Alan Gates <ga...@yahoo-inc.com>.
A general sum with group all can be done as:

A = load 'file' as (x, y);
B = group A all;
C = foreach B generate sum(A.x);

This will give you the sum of all x.  But from the schema you show  
below I'm not sure this is what you're trying to do.  Can you attach  
your script and an example record of your data?  This will make it  
easier to see what you're trying to do.

Alan.

On Jan 14, 2010, at 1:21 PM, Scott wrote:

> Hello fellow Pig users.  I am brand new to Pig/hadoop, and am having  
> trouble with something that I am guessing is very basic.  I have a  
> relation where I did a group by several values, then counted the  
> groups.  Here is a description of the relation:
>
> count_grouped: {g1: (site: chararray,tf: chararray,pos:  
> chararray,country: chararray,state: chararray,dma:  
> chararray),imp_count: long}
>
>
> Now I would like to calculate the sum of the imp_count value to get  
> a grand total.  The docs indicate I need to do a 'GROUP ALL' to  
> compute a global sum, but I can't seem to  make that work.  Any help  
> would be appreciated.
>
>
> Thanks,
> Scott
>