You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Swati Jain (JIRA)" <ji...@apache.org> on 2010/07/31 23:21:16 UTC
[jira] Created: (PIG-1530) PIG Logical Optimization: Push LOFilter
above LOCogroup
PIG Logical Optimization: Push LOFilter above LOCogroup
--------------------------------------------------------
Key: PIG-1530
URL: https://issues.apache.org/jira/browse/PIG-1530
Project: Pig
Issue Type: New Feature
Components: impl
Reporter: Swati Jain
Assignee: Swati Jain
Priority: Minor
Fix For: 0.8.0
Consider the following:
{noformat}
A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
G = COGROUP A by (a1,a2) , B by (b1,b2);
D = Filter G by group.$0 + 5 > group.$1;
explain D;
{noformat}
In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1530) PIG Logical Optimization: Push
LOFilter above LOCogroup
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-1530.
---------------------------------
Resolution: Duplicate
Xuefu is addressing this issue as part of https://issues.apache.org/jira/browse/PIG-1575.
> PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
> Key: PIG-1530
> URL: https://issues.apache.org/jira/browse/PIG-1530
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Swati Jain
> Assignee: Swati Jain
> Priority: Minor
> Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1530) PIG Logical Optimization: Push
LOFilter above LOCogroup
Posted by "Swati Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894467#action_12894467 ]
Swati Jain commented on PIG-1530:
---------------------------------
a) This is not a developer coding issue. The example I gave is in fact a fairly simple one. Developer programs could be fairly complex and it is not always easy for the developer to do such optimizations on his own. One of the important advantages of an optimizer is to remove the burden of thinking about these from the developer.
b) A general filter pushup rule (as you correctly observe) must be able to push a filter as far up as possible. The way this would work is iterative application of pushing LOFilter across all relational operators. Simple rules must exist for pushing a filter above individual relational operators, these in conjunction would allow a filter to be pushed up as far as it can go. As an example, after I added the rule for the above, I can see a program where the LOFilter is below a LOForeach-LOCogroup pair pushed above LOCogroup. This was the result of applying PushUpFilter across LOCogroup and LOForeach (which already exists as a separate rule).
c) Each relational operator has specifics which make it hard to write a single pattern and must be handled separately to ensure nuances specific to that relational operator are handled correctly. Both LOCogroup and LOJoin are examples where the rules have fairly distinct logic. I do think however that there should be a single rule (with multiple patterns) which handles pushing up an LOFilter. That is the reason why I have added the LOCogroup optimization in PushUpFilter instead of creating a separate rule.
> PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
> Key: PIG-1530
> URL: https://issues.apache.org/jira/browse/PIG-1530
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Swati Jain
> Assignee: Swati Jain
> Priority: Minor
> Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1530) PIG Logical Optimization: Push
LOFilter above LOCogroup
Posted by "Mridul Muralidharan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894368#action_12894368 ]
Mridul Muralidharan commented on PIG-1530:
------------------------------------------
This looks more like a developer coding issue - the filter does not depend on the cogroup in anyway and is a fairly specific pattern imo !
A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
A1 = Filter A by a1 + 5 > a2;
B1 = Filter B by b1 + 5 > b2;
--- And rest : including possible cogroup.
G = COGROUP A1 by (a1,a2) , B1 by (b1,b2);
explain D;
> PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
> Key: PIG-1530
> URL: https://issues.apache.org/jira/browse/PIG-1530
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Swati Jain
> Assignee: Swati Jain
> Priority: Minor
> Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1530) PIG Logical Optimization: Push
LOFilter above LOCogroup
Posted by "Mridul Muralidharan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894388#action_12894388 ]
Mridul Muralidharan commented on PIG-1530:
------------------------------------------
Cant edit comments .. to add to previous comment :
To clarify my prev comment, a general pattern would not stop just at co-group, but at most (if not all) operators which dont modify the relevant fields.
> PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
> Key: PIG-1530
> URL: https://issues.apache.org/jira/browse/PIG-1530
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Swati Jain
> Assignee: Swati Jain
> Priority: Minor
> Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are some tricky NULL issues to think about when the Cogroup is not of type INNER (Similar to issues that need to be thought through when pushing LOFilter on the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a ForEach-Cogroup pair. To make this really useful, we need to also implement LOFilter pushed across ForEach.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.