You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/05/06 21:36:00 UTC

[jira] [Created] (PIG-4536) LIMIT inside nested foreach should have combiner optimization

Rohini Palaniswamy created PIG-4536:
---------------------------------------

             Summary: LIMIT inside nested foreach should have combiner optimization
                 Key: PIG-4536
                 URL: https://issues.apache.org/jira/browse/PIG-4536
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy


data_group = GROUP A BY (f1, f2) PARALLEL 100;
group_result = FOREACH data_group {
B = LIMIT A.f3 1;

GENERATE group,  
SUM(A.f3),
SUM(A.f4),
SUM(A.f5),
SUM(A.f6),
FLATTEN(B);
};

A script like this has combiner optimization turned off and so consumes a lot of memory and is slow. We should implement LIMIT using Combiner in cases like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)