You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by paradisehit <pa...@163.com> on 2008/10/17 15:48:47 UTC
Why does the combine not act with flatten(many columns)?
I want to compute the statistics(like SUM/COUNT) of data, may be I also will use the SUM result to compute the next value. So I used the PigLatin like this:
urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
url_group = GROUP urls BY url;
clicks = FOREACH url_group GENERATE FLATTEN(url), SUM(urls.clicknum);
yes, when I use this ,the combine acts its function.
Reduce input groups099
Combine output records909
Map input records11011
but when I used like this:
urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
url_group = GROUP urls BY url;
clicks = FOREACH url_group GENERATE FLATTEN(url), FLATTEN(url_group.ip, url_group.time), SUM(urls.clicknum);
The combine doesn't act its function.
Reduce input groups099
Combine output records000
Map input records11011
How can I use the combiner when I also want save the other information such like ip, time in this example?
Re:Re: Why does the combine not act with flatten(many columns)?
Posted by paradisehit <pa...@163.com>.
Oh, I am so sorry that I wrote wrong: it's not url_group.ip and url_group.time.
And it is indeed urls.ip and urls.time. But it didn't useful for the combiner!
在2008-10-18,"Alan Gates" <ga...@yahoo-inc.com> 写道:
>In pig 0.1.0 the combiner is only invoked when the foreach is of the
>format:
>
>foreach X generate group, algebraicfunc(W) [, algebraicfunc(W)...]
>
>In the work on the types branch it is invoked any time your line
>consists only of simple projections and algebraic functions. It
>doesn't see url_group.ip as a simple projection (as you are
>projecting url_group and then ip from that), but there is a JIRA to
>add this (https://issues.apache.org/jira/browse/PIG-490 )
>
>Alan.
>
>On Oct 17, 2008, at 6:48 AM, paradisehit wrote:
>
>>
>> I want to compute the statistics(like SUM/COUNT) of data, may
>> be I also will use the SUM result to compute the next value. So I
>> used the PigLatin like this:
>> urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
>> url_group = GROUP urls BY url;
>> clicks = FOREACH url_group GENERATE FLATTEN(url), SUM
>> (urls.clicknum);
>>
>> yes, when I use this ,the combine acts its function.
>>
>>
>> Reduce input groups099
>> Combine output records909
>> Map input records11011
>>
>>
>>
>> but when I used like this:
>> urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
>> url_group = GROUP urls BY url;
>> clicks = FOREACH url_group GENERATE FLATTEN(url), FLATTEN
>> (url_group.ip, url_group.time), SUM(urls.clicknum);
>>
>> The combine doesn't act its function.
>>
>>
>>
>> Reduce input groups099
>> Combine output records000
>> Map input records11011
>>
>> How can I use the combiner when I also want save the other
>> information such like ip, time in this example?
>>
>
Re: Why does the combine not act with flatten(many columns)?
Posted by Alan Gates <ga...@yahoo-inc.com>.
In pig 0.1.0 the combiner is only invoked when the foreach is of the
format:
foreach X generate group, algebraicfunc(W) [, algebraicfunc(W)...]
In the work on the types branch it is invoked any time your line
consists only of simple projections and algebraic functions. It
doesn't see url_group.ip as a simple projection (as you are
projecting url_group and then ip from that), but there is a JIRA to
add this (https://issues.apache.org/jira/browse/PIG-490 )
Alan.
On Oct 17, 2008, at 6:48 AM, paradisehit wrote:
>
> I want to compute the statistics(like SUM/COUNT) of data, may
> be I also will use the SUM result to compute the next value. So I
> used the PigLatin like this:
> urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
> url_group = GROUP urls BY url;
> clicks = FOREACH url_group GENERATE FLATTEN(url), SUM
> (urls.clicknum);
>
> yes, when I use this ,the combine acts its function.
>
>
> Reduce input groups099
> Combine output records909
> Map input records11011
>
>
>
> but when I used like this:
> urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
> url_group = GROUP urls BY url;
> clicks = FOREACH url_group GENERATE FLATTEN(url), FLATTEN
> (url_group.ip, url_group.time), SUM(urls.clicknum);
>
> The combine doesn't act its function.
>
>
>
> Reduce input groups099
> Combine output records000
> Map input records11011
>
> How can I use the combiner when I also want save the other
> information such like ip, time in this example?
>