You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by paradisehit <pa...@163.com> on 2008/10/17 15:48:47 UTC

Why does the combine not act with flatten(many columns)?

 
     I want to compute the statistics(like SUM/COUNT) of data, may be I also will use the SUM result to compute the next value. So I used the PigLatin like this:
    urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
    url_group = GROUP urls BY url;
    clicks = FOREACH url_group GENERATE FLATTEN(url), SUM(urls.clicknum);

yes, when I use this ,the combine acts its function.


Reduce input groups099
Combine output records909
Map input records11011



but when I used like this:
    urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
    url_group = GROUP urls BY url;
    clicks = FOREACH url_group GENERATE FLATTEN(url), FLATTEN(url_group.ip, url_group.time), SUM(urls.clicknum);

The combine doesn't act its function.

 

Reduce input groups099
Combine output records000
Map input records11011

How can I use the combiner when I also want save the other information such like ip, time in this example?

 

Re:Re: Why does the combine not act with flatten(many columns)?

Posted by paradisehit <pa...@163.com>.
 Oh, I am so sorry that I wrote wrong: it's not url_group.ip and url_group.time.

And it is indeed urls.ip and urls.time. But it didn't useful for the combiner!

 
 
 


在2008-10-18,"Alan Gates" <ga...@yahoo-inc.com> 写道:
>In pig 0.1.0 the combiner is only invoked when the foreach is of the  
>format:
>
>foreach X generate group, algebraicfunc(W) [, algebraicfunc(W)...]
>
>In the work on the types branch it is invoked any time your line  
>consists only of simple projections and algebraic functions.  It  
>doesn't see url_group.ip as a simple projection (as you are  
>projecting url_group and then ip from that), but there is a JIRA to  
>add this (https://issues.apache.org/jira/browse/PIG-490 )
>
>Alan.
>
>On Oct 17, 2008, at 6:48 AM, paradisehit wrote:
>
>>
>>      I want to compute the statistics(like SUM/COUNT) of data, may  
>> be I also will use the SUM result to compute the next value. So I  
>> used the PigLatin like this:
>>     urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
>>     url_group = GROUP urls BY url;
>>     clicks = FOREACH url_group GENERATE FLATTEN(url), SUM 
>> (urls.clicknum);
>>
>> yes, when I use this ,the combine acts its function.
>>
>>
>> Reduce input groups099
>> Combine output records909
>> Map input records11011
>>
>>
>>
>> but when I used like this:
>>     urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
>>     url_group = GROUP urls BY url;
>>     clicks = FOREACH url_group GENERATE FLATTEN(url), FLATTEN 
>> (url_group.ip, url_group.time), SUM(urls.clicknum);
>>
>> The combine doesn't act its function.
>>
>>
>>
>> Reduce input groups099
>> Combine output records000
>> Map input records11011
>>
>> How can I use the combiner when I also want save the other  
>> information such like ip, time in this example?
>>
>

Re: Why does the combine not act with flatten(many columns)?

Posted by Alan Gates <ga...@yahoo-inc.com>.
In pig 0.1.0 the combiner is only invoked when the foreach is of the  
format:

foreach X generate group, algebraicfunc(W) [, algebraicfunc(W)...]

In the work on the types branch it is invoked any time your line  
consists only of simple projections and algebraic functions.  It  
doesn't see url_group.ip as a simple projection (as you are  
projecting url_group and then ip from that), but there is a JIRA to  
add this (https://issues.apache.org/jira/browse/PIG-490 )

Alan.

On Oct 17, 2008, at 6:48 AM, paradisehit wrote:

>
>      I want to compute the statistics(like SUM/COUNT) of data, may  
> be I also will use the SUM result to compute the next value. So I  
> used the PigLatin like this:
>     urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
>     url_group = GROUP urls BY url;
>     clicks = FOREACH url_group GENERATE FLATTEN(url), SUM 
> (urls.clicknum);
>
> yes, when I use this ,the combine acts its function.
>
>
> Reduce input groups099
> Combine output records909
> Map input records11011
>
>
>
> but when I used like this:
>     urls = LOAD 'logs' AS (url, ip,time, clicknum:int)
>     url_group = GROUP urls BY url;
>     clicks = FOREACH url_group GENERATE FLATTEN(url), FLATTEN 
> (url_group.ip, url_group.time), SUM(urls.clicknum);
>
> The combine doesn't act its function.
>
>
>
> Reduce input groups099
> Combine output records000
> Map input records11011
>
> How can I use the combiner when I also want save the other  
> information such like ip, time in this example?
>