You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2010/12/17 23:30:01 UTC

[jira] Created: (PIG-1776) changing statement corresponding to alias after explain , then doing dump gives incorrect result

changing statement corresponding to alias after explain , then doing dump gives incorrect result
------------------------------------------------------------------------------------------------

                 Key: PIG-1776
                 URL: https://issues.apache.org/jira/browse/PIG-1776
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.0
            Reporter: Thejas M Nair
            Assignee: Thejas M Nair
             Fix For: 0.8.0


{code}
grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
grunt> dump a;
(ABC,1,a)
(ABC,1,b)
(ABC,1,a)
(ABC,2,b)
(DEF,1,d)
(XYZ,1,x)

grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;          
grunt> dump c; -- gives correct results
(ABC,1,3)
(ABC,2,1)
(DEF,1,1)
(XYZ,1,1)

/* but dumping c after following steps gives incorrect results */

grunt> c = foreach b  generate group.$0 , (CHARARRAY)group.$1;                                                                                 
grunt> explain c;
...
...
grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;
grunt> dump c;             
(ABC,1,0)
(ABC,2,0)
(DEF,1,0)
(XYZ,1,0)

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1776) changing statement corresponding to alias after explain , then doing dump gives incorrect result

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987233#action_12987233 ] 

Thejas M Nair commented on PIG-1776:
------------------------------------

(adding more explanation to previous comment)
The root cause of the problem was that the UDFContext objects were not reset between plan regenerations. 
The explain command in the query set the requiredfields property of the load function to require only first two fields. When the plan was regenerated during the dump command, the optimizer rules figured that all columns in load statement are required, and it did not set the requiredfields property. As a result, the load projected only the first two columns and the 3rd column was null.
To fix this bug, each time a clone of the logical plan is created for regenerating the plan, the UDFContext is being reset. 

> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1776
>                 URL: https://issues.apache.org/jira/browse/PIG-1776
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;          
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b  generate group.$0 , (CHARARRAY)group.$1;                                                                                 
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;             
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1776) changing statement corresponding to alias after explain , then doing dump gives incorrect result

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987306#action_12987306 ] 

Richard Ding commented on PIG-1776:
-----------------------------------

+1

> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1776
>                 URL: https://issues.apache.org/jira/browse/PIG-1776
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;          
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b  generate group.$0 , (CHARARRAY)group.$1;                                                                                 
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;             
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1776) changing statement corresponding to alias after explain , then doing dump gives incorrect result

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987250#action_12987250 ] 

Thejas M Nair commented on PIG-1776:
------------------------------------

test-patch and all unit tests pass. 
     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.


> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1776
>                 URL: https://issues.apache.org/jira/browse/PIG-1776
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;          
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b  generate group.$0 , (CHARARRAY)group.$1;                                                                                 
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;             
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1776) changing statement corresponding to alias after explain , then doing dump gives incorrect result

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1776:
-------------------------------

    Attachment: PIG-1776.1.patch

The root cause of the problem was that the UDFContext objects were not reset between plan regenerations. The explain command in the query set the requiredfields property of the load function to require only first two fields. When the plan was regenerated during the dump command, the optimizer rules figured that all columns in load statement are required, and it did not set the requiredfields property. 
To fix this bug, each time a clone of the logical plan is created for regenerating the plan, the UDFContext is being reset.


> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1776
>                 URL: https://issues.apache.org/jira/browse/PIG-1776
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;          
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b  generate group.$0 , (CHARARRAY)group.$1;                                                                                 
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;             
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-1776) changing statement corresponding to alias after explain , then doing dump gives incorrect result

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair resolved PIG-1776.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0

Patch committed to 0.8 branch and trunk

> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1776
>                 URL: https://issues.apache.org/jira/browse/PIG-1776
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0, 0.8.0
>
>         Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;          
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b  generate group.$0 , (CHARARRAY)group.$1;                                                                                 
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b  generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;             
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.