You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/04/22 01:31:05 UTC

[jira] [Created] (PIG-2009) Better MergeForEach rule

Better MergeForEach rule
------------------------

                 Key: PIG-2009
                 URL: https://issues.apache.org/jira/browse/PIG-2009
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.9.0
            Reporter: Daniel Dai
            Assignee: Daniel Dai
             Fix For: 0.10


MergeForEach rule will not merge two consecutive ForEach if the second ForEach has inner relational plan. This prevent some optimizations. Eg,
{code}
A = LOAD 'input1' AS (a0, a1, a2);
B = LOAD 'input2' AS (b0, b1, b2);
C = cogroup A by a0, B by b0;
D = foreach C { E = limit A 10; F = E.a1; G = DISTINCT F; generate group, COUNT(G);};
explain D;
{code}
We add ForEach after cogroup to prune B, however, we cannot merge this ForEach with D. Secondary key optimization for this query is thus disabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2009) Better MergeForEach rule

Posted by "Olga Natkovich (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-2009:
--------------------------------

    Fix Version/s:     (was: 0.10)
    
> Better MergeForEach rule
> ------------------------
>
>                 Key: PIG-2009
>                 URL: https://issues.apache.org/jira/browse/PIG-2009
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>
> MergeForEach rule will not merge two consecutive ForEach if the second ForEach has inner relational plan. This prevent some optimizations. Eg,
> {code}
> A = LOAD 'input1' AS (a0, a1, a2);
> B = LOAD 'input2' AS (b0, b1, b2);
> C = cogroup A by a0, B by b0;
> D = foreach C { E = limit A 10; F = E.a1; G = DISTINCT F; generate group, COUNT(G);};
> explain D;
> {code}
> We add ForEach after cogroup to prune B, however, we cannot merge this ForEach with D. Secondary key optimization for this query is thus disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira