You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/05/06 21:36:00 UTC
[jira] [Created] (PIG-4536) LIMIT inside nested foreach should have
combiner optimization
Rohini Palaniswamy created PIG-4536:
---------------------------------------
Summary: LIMIT inside nested foreach should have combiner optimization
Key: PIG-4536
URL: https://issues.apache.org/jira/browse/PIG-4536
Project: Pig
Issue Type: Improvement
Reporter: Rohini Palaniswamy
data_group = GROUP A BY (f1, f2) PARALLEL 100;
group_result = FOREACH data_group {
B = LIMIT A.f3 1;
GENERATE group,
SUM(A.f3),
SUM(A.f4),
SUM(A.f5),
SUM(A.f6),
FLATTEN(B);
};
A script like this has combiner optimization turned off and so consumes a lot of memory and is slow. We should implement LIMIT using Combiner in cases like this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)