You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Syed Wasti <md...@hotmail.com> on 2010/07/01 00:12:53 UTC

Re: FOREACH/GROUP BY question

I guess this is what you are looking for;

lsccnt =    FOREACH lscg {
            dist_id = DISTINCT lsc.listener_id;
            GENERATE group.from_state, group.to_state, COUNT(dist_id);
            };


On 6/30/10 2:18 PM, "elein" <el...@varlena.com> wrote:

> 
> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
>    AS (daterecorded:chararray, listener_id:long, to_state:chararray,
> from_state:chararray);
> describe lsc;
> lscg = group lsc by (from_state, to_state);
> describe lscg;
> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
> COUNT(lsc.listener_id);
> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
> 
> The first lsccnt line generates (,,0L) and the second generates (,,54321);
> What I want is tuples like
> (state1,state2,123)
> (state3,state2,456
> 
> And so on for each combination of from_state and to_state.
> 
> What am I missing?
> 
> elein
> elein@varlena.com
> 
> 
> 
> 
> 



Re: FOREACH/GROUP BY question

Posted by elein <el...@varlena.com>.

On Jun 30, 2010, at 3:17 PM, Syed Wasti wrote:

> OR use the COUNT_STAR function to compute the number of elements in a bag.
> 
> lsccnt = FOREACH lscg generate group.from_state, group.to_state,
> COUNT_STAR(lsc);
> 
> 
> On 6/30/10 3:12 PM, "Syed Wasti" <md...@hotmail.com> wrote:
> 
>> I guess this is what you are looking for;
>> 
>> lsccnt =    FOREACH lscg {
>>            dist_id = DISTINCT lsc.listener_id;
>>            GENERATE group.from_state, group.to_state, COUNT(dist_id);
>>            };
>> 
>> 
>> On 6/30/10 2:18 PM, "elein" <el...@varlena.com> wrote:
>> 
>>> 
>>> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
>>>   AS (daterecorded:chararray, listener_id:long, to_state:chararray,
>>> from_state:chararray);
>>> describe lsc;
>>> lscg = group lsc by (from_state, to_state);
>>> describe lscg;
>>> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
>>> COUNT(lsc.listener_id);
>>> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
>>> 
>>> The first lsccnt line generates (,,0L) and the second generates (,,54321);
>>> What I want is tuples like
>>> (state1,state2,123)
>>> (state3,state2,456
>>> 
>>> And so on for each combination of from_state and to_state.
>>> 
>>> What am I missing?
>>> 
>>> elein
>>> elein@varlena.com

I will try both of those things.  I've found however, I'm dealing with an xml file instead
of a tab separated file and need to figure out how to get access to the
loader udf.  Obviously I'm a newbie and just getting started and my environment is not quite
together.

Thank you,

elein
elein@varlena.com





Re: FOREACH/GROUP BY question

Posted by Syed Wasti <md...@hotmail.com>.
OR use the COUNT_STAR function to compute the number of elements in a bag.

lsccnt = FOREACH lscg generate group.from_state, group.to_state,
COUNT_STAR(lsc);


On 6/30/10 3:12 PM, "Syed Wasti" <md...@hotmail.com> wrote:

> I guess this is what you are looking for;
> 
> lsccnt =    FOREACH lscg {
>             dist_id = DISTINCT lsc.listener_id;
>             GENERATE group.from_state, group.to_state, COUNT(dist_id);
>             };
> 
> 
> On 6/30/10 2:18 PM, "elein" <el...@varlena.com> wrote:
> 
>> 
>> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
>>    AS (daterecorded:chararray, listener_id:long, to_state:chararray,
>> from_state:chararray);
>> describe lsc;
>> lscg = group lsc by (from_state, to_state);
>> describe lscg;
>> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
>> COUNT(lsc.listener_id);
>> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
>> 
>> The first lsccnt line generates (,,0L) and the second generates (,,54321);
>> What I want is tuples like
>> (state1,state2,123)
>> (state3,state2,456
>> 
>> And so on for each combination of from_state and to_state.
>> 
>> What am I missing?
>> 
>> elein
>> elein@varlena.com
>> 
>> 
>> 
>> 
>> 
> 
> 
>