You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Syed Wasti <md...@hotmail.com> on 2010/07/01 00:12:53 UTC
Re: FOREACH/GROUP BY question
I guess this is what you are looking for;
lsccnt = FOREACH lscg {
dist_id = DISTINCT lsc.listener_id;
GENERATE group.from_state, group.to_state, COUNT(dist_id);
};
On 6/30/10 2:18 PM, "elein" <el...@varlena.com> wrote:
>
> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
> AS (daterecorded:chararray, listener_id:long, to_state:chararray,
> from_state:chararray);
> describe lsc;
> lscg = group lsc by (from_state, to_state);
> describe lscg;
> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
> COUNT(lsc.listener_id);
> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
>
> The first lsccnt line generates (,,0L) and the second generates (,,54321);
> What I want is tuples like
> (state1,state2,123)
> (state3,state2,456
>
> And so on for each combination of from_state and to_state.
>
> What am I missing?
>
> elein
> elein@varlena.com
>
>
>
>
>
Re: FOREACH/GROUP BY question
Posted by elein <el...@varlena.com>.
On Jun 30, 2010, at 3:17 PM, Syed Wasti wrote:
> OR use the COUNT_STAR function to compute the number of elements in a bag.
>
> lsccnt = FOREACH lscg generate group.from_state, group.to_state,
> COUNT_STAR(lsc);
>
>
> On 6/30/10 3:12 PM, "Syed Wasti" <md...@hotmail.com> wrote:
>
>> I guess this is what you are looking for;
>>
>> lsccnt = FOREACH lscg {
>> dist_id = DISTINCT lsc.listener_id;
>> GENERATE group.from_state, group.to_state, COUNT(dist_id);
>> };
>>
>>
>> On 6/30/10 2:18 PM, "elein" <el...@varlena.com> wrote:
>>
>>>
>>> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
>>> AS (daterecorded:chararray, listener_id:long, to_state:chararray,
>>> from_state:chararray);
>>> describe lsc;
>>> lscg = group lsc by (from_state, to_state);
>>> describe lscg;
>>> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
>>> COUNT(lsc.listener_id);
>>> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
>>>
>>> The first lsccnt line generates (,,0L) and the second generates (,,54321);
>>> What I want is tuples like
>>> (state1,state2,123)
>>> (state3,state2,456
>>>
>>> And so on for each combination of from_state and to_state.
>>>
>>> What am I missing?
>>>
>>> elein
>>> elein@varlena.com
I will try both of those things. I've found however, I'm dealing with an xml file instead
of a tab separated file and need to figure out how to get access to the
loader udf. Obviously I'm a newbie and just getting started and my environment is not quite
together.
Thank you,
elein
elein@varlena.com
Re: FOREACH/GROUP BY question
Posted by Syed Wasti <md...@hotmail.com>.
OR use the COUNT_STAR function to compute the number of elements in a bag.
lsccnt = FOREACH lscg generate group.from_state, group.to_state,
COUNT_STAR(lsc);
On 6/30/10 3:12 PM, "Syed Wasti" <md...@hotmail.com> wrote:
> I guess this is what you are looking for;
>
> lsccnt = FOREACH lscg {
> dist_id = DISTINCT lsc.listener_id;
> GENERATE group.from_state, group.to_state, COUNT(dist_id);
> };
>
>
> On 6/30/10 2:18 PM, "elein" <el...@varlena.com> wrote:
>
>>
>> lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30'
>> AS (daterecorded:chararray, listener_id:long, to_state:chararray,
>> from_state:chararray);
>> describe lsc;
>> lscg = group lsc by (from_state, to_state);
>> describe lscg;
>> //lsccnt = FOREACH lscg generate group.from_state, group.to_state,
>> COUNT(lsc.listener_id);
>> lsccnt = FOREACH lscg generate group.from_state, group.to_state, COUNT(lsc);
>>
>> The first lsccnt line generates (,,0L) and the second generates (,,54321);
>> What I want is tuples like
>> (state1,state2,123)
>> (state3,state2,456
>>
>> And so on for each combination of from_state and to_state.
>>
>> What am I missing?
>>
>> elein
>> elein@varlena.com
>>
>>
>>
>>
>>
>
>
>