You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2010/08/25 23:38:16 UTC
[jira] Created: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
----------------------------------------------------------------------------------------------
Key: PIG-1568
URL: https://issues.apache.org/jira/browse/PIG-1568
Project: Pig
Issue Type: Bug
Reporter: Xuefu Zhang
Fix For: 0.8.0
FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
A = LOAD 'file.txt' AS (a(u,v), b, c);
B = FOREACH A GENERATE $0, b;
C = FILTER B BY 8 > 5;
STORE C INTO 'empty';
2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
A = LOAD 'file.txt' AS (a(u,v), b, c);
B = FOREACH A GENERATE $0, b;
C = FILTER B BY Identity.class.getName(*) > 5;
STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated PIG-1568:
-----------------------------
Status: Open (was: Patch Available)
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1568-1.patch, jira-1568-1.patch
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated PIG-1568:
-----------------------------
Attachment: jira-1568-1.patch
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1568-1.patch, jira-1568-1.patch
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1568:
----------------------------
Status: Resolved (was: Patch Available)
Hadoop Flags: [Reviewed]
Resolution: Fixed
test-patch result:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
Patch committed. Thanks Xuefu!
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1568-1.patch, jira-1568-1.patch
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang reassigned PIG-1568:
--------------------------------
Assignee: Xuefu Zhang
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated PIG-1568:
-----------------------------
Status: Patch Available (was: Open)
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1568-1.patch
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated PIG-1568:
-----------------------------
Attachment: jira-1568-1.patch
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1568-1.patch
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is
too restrictive and doesn't handle project * correctly
Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated PIG-1568:
-----------------------------
Status: Patch Available (was: Open)
Regenerate the patch after fixing failed test case. The test case itself was changed as it uses an internal bug. When a UDF takes no argument, PIG backend passes the whole input to the UDF. This needs to be corrected. In another word, if a UDF doesn't specify any argument, we assume that it doesn't need any input. If a UDF needs all input, it can either specify a star (*). It can also list whatever it requires in the argument list.
A Jira tracking Pig backend changes will be created.
> Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
> ----------------------------------------------------------------------------------------------
>
> Key: PIG-1568
> URL: https://issues.apache.org/jira/browse/PIG-1568
> Project: Pig
> Issue Type: Bug
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1568-1.patch, jira-1568-1.patch
>
>
> FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found:
> 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY 8 > 5;
> STORE C INTO 'empty';
> 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following:
> A = LOAD 'file.txt' AS (a(u,v), b, c);
> B = FOREACH A GENERATE $0, b;
> C = FILTER B BY Identity.class.getName(*) > 5;
> STORE C INTO 'empty';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.