You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Swati Jain (JIRA)" <ji...@apache.org> on 2010/07/31 23:21:16 UTC

[jira] Created: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup

 PIG Logical Optimization: Push LOFilter above LOCogroup
--------------------------------------------------------

                 Key: PIG-1530
                 URL: https://issues.apache.org/jira/browse/PIG-1530
             Project: Pig
          Issue Type: New Feature
          Components: impl
            Reporter: Swati Jain
            Assignee: Swati Jain
            Priority: Minor
             Fix For: 0.8.0


Consider the following:
{noformat}
A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
G = COGROUP A by (a1,a2) , B by (b1,b2);
D = Filter G by group.$0 + 5 > group.$1;
explain D;
{noformat}

In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).

Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-1530.
---------------------------------

    Resolution: Duplicate

Xuefu is addressing this issue as part of https://issues.apache.org/jira/browse/PIG-1575.

>  PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
>                 Key: PIG-1530
>                 URL: https://issues.apache.org/jira/browse/PIG-1530
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup

Posted by "Swati Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894467#action_12894467 ] 

Swati Jain commented on PIG-1530:
---------------------------------

a) This is not a developer coding issue. The example I gave is in fact a fairly simple one. Developer programs could be fairly complex and it is not always easy for the developer to do such optimizations on his own. One of the important advantages of an optimizer is to remove the burden of thinking about these from the developer.

b) A general filter pushup rule (as you correctly observe) must be able to push a filter as far up as possible. The way this would work is iterative application of pushing LOFilter across all relational operators. Simple rules must exist for pushing a filter above individual relational operators, these in conjunction would allow a filter to be pushed up as far as it can go. As an example, after I added the rule for the above, I can see a program where the LOFilter is below a LOForeach-LOCogroup pair pushed above LOCogroup. This was the result of applying PushUpFilter across LOCogroup and LOForeach (which already exists as a separate rule).

c) Each relational operator has specifics which make it hard to write a single pattern and must be handled separately to ensure nuances specific to that relational operator are handled correctly. Both LOCogroup and LOJoin are examples where the rules have fairly distinct logic. I do think however that there should be a single rule (with multiple patterns) which handles pushing up an LOFilter. That is the reason why I have added the LOCogroup optimization in PushUpFilter instead of creating a separate rule.

>  PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
>                 Key: PIG-1530
>                 URL: https://issues.apache.org/jira/browse/PIG-1530
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup

Posted by "Mridul Muralidharan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894368#action_12894368 ] 

Mridul Muralidharan commented on PIG-1530:
------------------------------------------

This looks more like a developer coding issue - the filter does not depend on the cogroup in anyway and is a fairly specific pattern imo !


A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
A1 = Filter A by a1 + 5 > a2;
B1 = Filter B by b1 + 5 > b2;

--- And rest : including possible cogroup.
G = COGROUP A1 by (a1,a2) , B1 by (b1,b2);
explain D;


>  PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
>                 Key: PIG-1530
>                 URL: https://issues.apache.org/jira/browse/PIG-1530
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup

Posted by "Mridul Muralidharan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894388#action_12894388 ] 

Mridul Muralidharan commented on PIG-1530:
------------------------------------------

Cant edit comments .. to add to previous comment : 
To clarify my prev comment, a general pattern would not stop just at co-group, but at most (if not all) operators which dont modify the relevant fields.

>  PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
>                 Key: PIG-1530
>                 URL: https://issues.apache.org/jira/browse/PIG-1530
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.