You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Daniel Dai <da...@gmail.com> on 2011/01/12 23:38:42 UTC
Review Request: Schema reported from DESCRIBE and actual schema of inner
bags are different.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/278/
-----------------------------------------------------------
Review request for pig and Richard Ding.
Summary
-------
The following script:
urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, pg:bytearray);
– describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;
urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;
urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;
DESCRIBE urlContentsF;
DUMP urlContentsF;
Prints for the DESCRIBE commands:
urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}
The reported schemas for urlContentsG and urlContentsF are wrong. They are also against the section "Schemas for Complex Data Types" in http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.
As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF do contain the tuple inside the inner bags.
The correct schema for urlContentsG is: {group: chararray,urlContents: {t1:(url: chararray,pg: chararray)}}
This may sound like a technicality, but it isn't. For instance, a UDF that assumes an inner bag of {chararray} will not work with {(chararray)}.
This addresses bug PIG-767.
https://issues.apache.org/jira/browse/PIG-767
Diffs
-----
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOGenerate.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOInnerLoad.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLogicalPlanMigrationVisitor.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1057928
Diff: https://reviews.apache.org/r/278/diff
Testing
-------
Test-patch:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 9 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
Unit-test:
all pass.
Thanks,
Daniel
Re: Review Request: Schema reported from DESCRIBE and actual schema of inner
bags are different.
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/278/
-----------------------------------------------------------
(Updated 2011-01-24 10:28:18.178511)
Review request for pig and Richard Ding.
Summary
-------
The following script:
urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, pg:bytearray);
– describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;
urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;
urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;
DESCRIBE urlContentsF;
DUMP urlContentsF;
Prints for the DESCRIBE commands:
urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}
The reported schemas for urlContentsG and urlContentsF are wrong. They are also against the section "Schemas for Complex Data Types" in http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.
As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF do contain the tuple inside the inner bags.
The correct schema for urlContentsG is: {group: chararray,urlContents: {t1:(url: chararray,pg: chararray)}}
This may sound like a technicality, but it isn't. For instance, a UDF that assumes an inner bag of {chararray} will not work with {(chararray)}.
This addresses bug PIG-767.
https://issues.apache.org/jira/browse/PIG-767
Diffs
-----
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOGenerate.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOInnerLoad.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLogicalPlanMigrationVisitor.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java 1057928
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1057928
Diff: https://reviews.apache.org/r/278/diff
Testing (updated)
-------
Test-patch:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 9 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
Unit-test:
all pass.
End-to-end test:
all pass.
Thanks,
Daniel