You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2010/12/17 23:30:01 UTC
[jira] Created: (PIG-1776) changing statement corresponding to
alias after explain , then doing dump gives incorrect result
changing statement corresponding to alias after explain , then doing dump gives incorrect result
------------------------------------------------------------------------------------------------
Key: PIG-1776
URL: https://issues.apache.org/jira/browse/PIG-1776
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0
{code}
grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
grunt> dump a;
(ABC,1,a)
(ABC,1,b)
(ABC,1,a)
(ABC,2,b)
(DEF,1,d)
(XYZ,1,x)
grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
grunt> dump c; -- gives correct results
(ABC,1,3)
(ABC,2,1)
(DEF,1,1)
(XYZ,1,1)
/* but dumping c after following steps gives incorrect results */
grunt> c = foreach b generate group.$0 , (CHARARRAY)group.$1;
grunt> explain c;
...
...
grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
grunt> dump c;
(ABC,1,0)
(ABC,2,0)
(DEF,1,0)
(XYZ,1,0)
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1776) changing statement corresponding to
alias after explain , then doing dump gives incorrect result
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987233#action_12987233 ]
Thejas M Nair commented on PIG-1776:
------------------------------------
(adding more explanation to previous comment)
The root cause of the problem was that the UDFContext objects were not reset between plan regenerations.
The explain command in the query set the requiredfields property of the load function to require only first two fields. When the plan was regenerated during the dump command, the optimizer rules figured that all columns in load statement are required, and it did not set the requiredfields property. As a result, the load projected only the first two columns and the 3rd column was null.
To fix this bug, each time a clone of the logical plan is created for regenerating the plan, the UDFContext is being reset.
> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
> Key: PIG-1776
> URL: https://issues.apache.org/jira/browse/PIG-1776
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b generate group.$0 , (CHARARRAY)group.$1;
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1776) changing statement corresponding to
alias after explain , then doing dump gives incorrect result
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987306#action_12987306 ]
Richard Ding commented on PIG-1776:
-----------------------------------
+1
> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
> Key: PIG-1776
> URL: https://issues.apache.org/jira/browse/PIG-1776
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b generate group.$0 , (CHARARRAY)group.$1;
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1776) changing statement corresponding to
alias after explain , then doing dump gives incorrect result
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987250#action_12987250 ]
Thejas M Nair commented on PIG-1776:
------------------------------------
test-patch and all unit tests pass.
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
> Key: PIG-1776
> URL: https://issues.apache.org/jira/browse/PIG-1776
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b generate group.$0 , (CHARARRAY)group.$1;
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1776) changing statement corresponding to
alias after explain , then doing dump gives incorrect result
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1776:
-------------------------------
Attachment: PIG-1776.1.patch
The root cause of the problem was that the UDFContext objects were not reset between plan regenerations. The explain command in the query set the requiredfields property of the load function to require only first two fields. When the plan was regenerated during the dump command, the optimizer rules figured that all columns in load statement are required, and it did not set the requiredfields property.
To fix this bug, each time a clone of the logical plan is created for regenerating the plan, the UDFContext is being reset.
> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
> Key: PIG-1776
> URL: https://issues.apache.org/jira/browse/PIG-1776
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b generate group.$0 , (CHARARRAY)group.$1;
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1776) changing statement corresponding to
alias after explain , then doing dump gives incorrect result
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair resolved PIG-1776.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.9.0
Patch committed to 0.8 branch and trunk
> changing statement corresponding to alias after explain , then doing dump gives incorrect result
> ------------------------------------------------------------------------------------------------
>
> Key: PIG-1776
> URL: https://issues.apache.org/jira/browse/PIG-1776
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0, 0.8.0
>
> Attachments: PIG-1776.1.patch
>
>
> {code}
> grunt> a = load '/tmp/t2.txt' as (str:chararray, num1:int, alph : chararray);
> grunt> dump a;
> (ABC,1,a)
> (ABC,1,b)
> (ABC,1,a)
> (ABC,2,b)
> (DEF,1,d)
> (XYZ,1,x)
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c; -- gives correct results
> (ABC,1,3)
> (ABC,2,1)
> (DEF,1,1)
> (XYZ,1,1)
> /* but dumping c after following steps gives incorrect results */
> grunt> c = foreach b generate group.$0 , (CHARARRAY)group.$1;
> grunt> explain c;
> ...
> ...
> grunt> c = foreach b generate group.str, group.$1, COUNT(a.alph) ;
> grunt> dump c;
> (ABC,1,0)
> (ABC,2,0)
> (DEF,1,0)
> (XYZ,1,0)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.