You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Daniel Dai <da...@gmail.com> on 2011/01/12 23:38:42 UTC

Review Request: Schema reported from DESCRIBE and actual schema of inner bags are different.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/278/
-----------------------------------------------------------

Review request for pig and Richard Ding.


Summary
-------

The following script:

urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, pg:bytearray);
– describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;

urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;

urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;

DESCRIBE urlContentsF;
DUMP urlContentsF;

Prints for the DESCRIBE commands:

urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}

The reported schemas for urlContentsG and urlContentsF are wrong. They are also against the section "Schemas for Complex Data Types" in http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.

As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF do contain the tuple inside the inner bags.

The correct schema for urlContentsG is: {group: chararray,urlContents: {t1:(url: chararray,pg: chararray)}}

This may sound like a technicality, but it isn't. For instance, a UDF that assumes an inner bag of {chararray} will not work with {(chararray)}. 


This addresses bug PIG-767.
    https://issues.apache.org/jira/browse/PIG-767


Diffs
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOGenerate.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOInnerLoad.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLogicalPlanMigrationVisitor.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1057928 

Diff: https://reviews.apache.org/r/278/diff


Testing
-------

Test-patch:
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

Unit-test:
    all pass.


Thanks,

Daniel


Re: Review Request: Schema reported from DESCRIBE and actual schema of inner bags are different.

Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/278/
-----------------------------------------------------------

(Updated 2011-01-24 10:28:18.178511)


Review request for pig and Richard Ding.


Summary
-------

The following script:

urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, pg:bytearray);
– describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;

urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;

urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;

DESCRIBE urlContentsF;
DUMP urlContentsF;

Prints for the DESCRIBE commands:

urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}

The reported schemas for urlContentsG and urlContentsF are wrong. They are also against the section "Schemas for Complex Data Types" in http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.

As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF do contain the tuple inside the inner bags.

The correct schema for urlContentsG is: {group: chararray,urlContents: {t1:(url: chararray,pg: chararray)}}

This may sound like a technicality, but it isn't. For instance, a UDF that assumes an inner bag of {chararray} will not work with {(chararray)}. 


This addresses bug PIG-767.
    https://issues.apache.org/jira/browse/PIG-767


Diffs
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOGenerate.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOInnerLoad.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLogicalPlanMigrationVisitor.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java 1057928 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1057928 

Diff: https://reviews.apache.org/r/278/diff


Testing (updated)
-------

Test-patch:
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

Unit-test:
    all pass.

End-to-end test:
    all pass.


Thanks,

Daniel