You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2009/10/14 18:25:31 UTC

[jira] Created: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

optimizer pushes filter before the foreach that generates column used by filter
-------------------------------------------------------------------------------

                 Key: PIG-1022
                 URL: https://issues.apache.org/jira/browse/PIG-1022
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Thejas M Nair


grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
grunt> g = group f by (name, gid);
grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
grunt> filt = filter f2 by gid == '200';
grunt> explain filt;

In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766294#action_12766294 ] 

Daniel Dai commented on PIG-1022:
---------------------------------

Seems "project fixer up" require nested field to be counted as mapped fields. And for pushupfilter, nested field should not be counted as a mapped fields. We need to clarify the definition of projectMap.mappedFields first. 

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-1022.
---------------------------


> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.7.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765682#action_12765682 ] 

Daniel Dai commented on PIG-1022:
---------------------------------

Actually we cannot push the filter even before f2. Since we do not keep track of the source of data inside tuple, so gid should be treated as a generated field of f2. However, projection map of f2 give us the wrong result that gid is a directly mapped field of group (which is a tuple (name, gid)), and this triggers all the subsequences. The fix for this problem is to modify the projection map generation logic for the mapped field. 

Santhosh, do you have any comment?

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768120#action_12768120 ] 

Hadoop QA commented on PIG-1022:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422754/PIG-1022-1.patch
  against trunk revision 827829.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/104/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/104/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/104/console

This message is automatically generated.

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1022:
----------------------------

    Status: Patch Available  (was: Open)

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771157#action_12771157 ] 

Daniel Dai commented on PIG-1022:
---------------------------------

The core test failure is temporal due to "port conflict"

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai reassigned PIG-1022:
-------------------------------

    Assignee: Daniel Dai

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784582#action_12784582 ] 

Alan Gates commented on PIG-1022:
---------------------------------

Patch checked into 0.6 branch as well.

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.7.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770831#action_12770831 ] 

Hadoop QA commented on PIG-1022:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422754/PIG-1022-1.patch
  against trunk revision 830335.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/119/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/119/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/119/console

This message is automatically generated.

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1022:
----------------------------

    Attachment: PIG-1022-1.patch

Attach the patch. Thanks Santhosh for helping analyze the problem.

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1022:
----------------------------

        Fix Version/s: 0.6.0
    Affects Version/s: 0.4.0
               Status: Patch Available  (was: Open)

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784279#action_12784279 ] 

Alan Gates commented on PIG-1022:
---------------------------------

+1, patch looks good.  I also reran the unit tests since it had been a while since this patch was posted, and they all still pass.

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1022:
----------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.6.0)
                   0.7.0
           Status: Resolved  (was: Patch Available)

Checked that patch in for Daniel since I'd already applied it and tested it.

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.7.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1022:
----------------------------

    Status: Open  (was: Patch Available)

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768312#action_12768312 ] 

Daniel Dai commented on PIG-1022:
---------------------------------

core tests pass manually

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765612#action_12765612 ] 

Thejas M Nair commented on PIG-1022:
------------------------------------

${code}
grunt> explain filt;
#-----------------------------------------------
# Logical Plan:
#-----------------------------------------------

Store 1-1162 Schema: {name: chararray,gid: chararray} Type: Unknown
|
|---ForEach 1-1148 Schema: {name: chararray,gid: chararray} Type: bag
    |   |
    |   Project 1-1144 Projections: [0] Overloaded: false FieldSchema: name: chararray Type: chararray
    |   Input: Project 1-1145 Projections: [0] Overloaded: false|
    |   |---Project 1-1145 Projections: [0] Overloaded: false FieldSchema: group: tuple({name: chararray,gid: chararray}) Type: tuple
    |       Input: CoGroup 1-1138
    |   |
    |   Project 1-1146 Projections: [1] Overloaded: false FieldSchema: gid: chararray Type: chararray
    |   Input: Project 1-1147 Projections: [0] Overloaded: false|
    |   |---Project 1-1147 Projections: [0] Overloaded: false FieldSchema: group: tuple({name: chararray,gid: chararray}) Type: tuple
    |       Input: CoGroup 1-1138
    |
    |---CoGroup 1-1138 Schema: {group: (name: chararray,gid: chararray),f: {name: chararray,gender: chararray,age: chararray,score: chararray,gid: chararray}} Type: bag
        |   |
        |   Project 1-1136 Projections: [0] Overloaded: false FieldSchema: name: chararray Type: chararray
        |   Input: ForEach 1-1135
        |   |
        |   Project 1-1137 Projections: [4] Overloaded: false FieldSchema: gid: chararray Type: chararray
        |   Input: ForEach 1-1135
        |
        |---ForEach 1-1135 Schema: {name: chararray,gender: chararray,age: chararray,score: chararray,gid: chararray} Type: bag
            |   |
            |   Project 1-1130 Projections: [0] Overloaded: false FieldSchema: name: chararray Type: chararray
            |   Input: Filter 1-1152
            |   |
            |   Project 1-1131 Projections: [1] Overloaded: false FieldSchema: gender: chararray Type: chararray
            |   Input: Filter 1-1152
            |   |
            |   Project 1-1132 Projections: [2] Overloaded: false FieldSchema: age: chararray Type: chararray
            |   Input: Filter 1-1152
            |   |
            |   Project 1-1133 Projections: [3] Overloaded: false FieldSchema: score: chararray Type: chararray
            |   Input: Filter 1-1152
            |   |
            |   Const 1-1134( 200 ) FieldSchema: chararray Type: chararray
            |
            |---Filter 1-1152 Schema: {name: chararray,gender: chararray,age: chararray,score: chararray} Type: bag
                |   |
                |   Equal 1-1151 FieldSchema: boolean Type: boolean
                |   |
                |   |---Project 1-1149 Projections: [0] Overloaded: false FieldSchema: name: chararray Type: chararray
                |   |   Input: ForEach 1-1161
                |   |
                |   |---Const 1-1150( 200 ) FieldSchema: chararray Type: chararray
                |
                |---ForEach 1-1161 Schema: {name: chararray,gender: chararray,age: chararray,score: chararray} Type: bag
                    |   |
                    |   Cast 1-1154 FieldSchema: name: chararray Type: chararray
                    |   |
                    |   |---Project 1-1153 Projections: [0] Overloaded: false FieldSchema: name: bytearray Type: bytearray
                    |       Input: Load 1-1123
                    |   |
                    |   Cast 1-1156 FieldSchema: gender: chararray Type: chararray
                    |   |
                    |   |---Project 1-1155 Projections: [1] Overloaded: false FieldSchema: gender: bytearray Type: bytearray
                    |       Input: Load 1-1123
                    |   |
                    |   Cast 1-1158 FieldSchema: age: chararray Type: chararray
                    |   |
                    |   |---Project 1-1157 Projections: [2] Overloaded: false FieldSchema: age: bytearray Type: bytearray
                    |       Input: Load 1-1123
                    |   |
                    |   Cast 1-1160 FieldSchema: score: chararray Type: chararray
                    |   |
                    |   |---Project 1-1159 Projections: [3] Overloaded: false FieldSchema: score: bytearray Type: bytearray
                    |       Input: Load 1-1123
                    |
                    |---Load 1-1123 Schema: {name: bytearray,gender: bytearray,age: bytearray,score: bytearray} Type: bag

${code}

> optimizer pushes filter before the foreach that generates column used by filter
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.