You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Richard Ding (JIRA)" <ji...@apache.org> on 2011/03/18 23:55:29 UTC

[jira] Created: (PIG-1920) An error in new parser

An error in new parser
----------------------

                 Key: PIG-1920
                 URL: https://issues.apache.org/jira/browse/PIG-1920
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.9.0
            Reporter: Richard Ding
            Assignee: Xuefu Zhang
             Fix For: 0.9.0


Run following Pig script on trunk:

{code}
A = load 'input' as (v, u);
B = group A by $0;
C = group B by $0;
describe C;
R = foreach C generate B.A.v; 
describe R;
{code}

One gets the this error:

{code}
C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Invalid field reference. Referenced field [v] does not exist in schema: A#19:bag{null#20:tuple(v#17:bytearray,u#18:bytearray)}.
{code}

Change the 5th line to 

{code}
R = foreach C generate B.A.$0; 
{code}

One gets this output:

{code}
C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}
R: {{(A: {(v: bytearray,u: bytearray)})}}
{code}

This is different (and wrong) from the corresponding Pig 0.8 output:

{code}
C: {group: bytearray,B: {group: bytearray,A: {v: bytearray,u: bytearray}}}
R: {{v: bytearray}}
{code}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-1920) An error in new parser

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-1920.
-----------------------------

    Resolution: Fixed

Actually this is the right behavior. 0.8 doing it wrong.

The schema we get for C is C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}

The schema for B is: B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}

The schema for B.A is: B:{(A: {(v: bytearray,u: bytearray)})}

Note B.A don't flatten a bag, it results a two level bag. v is in the inner bag and cannot be referenced. 

B.A.$0 is valid, but it equals to B.A.

In 0.8, describe B.A.v is valid, however, dump it we get B.A.$0, which is not the right result.

So actually this bug is in 0.8, and we fix it in 0.9.

> An error in new parser
> ----------------------
>
>                 Key: PIG-1920
>                 URL: https://issues.apache.org/jira/browse/PIG-1920
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Richard Ding
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>
> Run following Pig script on trunk:
> {code}
> A = load 'input' as (v, u);
> B = group A by $0;
> C = group B by $0;
> describe C;
> R = foreach C generate B.A.v; 
> describe R;
> {code}
> One gets the this error:
> {code}
> C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Invalid field reference. Referenced field [v] does not exist in schema: A#19:bag{null#20:tuple(v#17:bytearray,u#18:bytearray)}.
> {code}
> Change the 5th line to 
> {code}
> R = foreach C generate B.A.$0; 
> {code}
> One gets this output:
> {code}
> C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}
> R: {{(A: {(v: bytearray,u: bytearray)})}}
> {code}
> This is different (and wrong) from the corresponding Pig 0.8 output:
> {code}
> C: {group: bytearray,B: {group: bytearray,A: {v: bytearray,u: bytearray}}}
> R: {{v: bytearray}}
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-1920) An error in new parser

Posted by "Xuefu Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xuefu Zhang reassigned PIG-1920:
--------------------------------

    Assignee: Daniel Dai  (was: Xuefu Zhang)

It seems related to the 'null#u:int' issue.

> An error in new parser
> ----------------------
>
>                 Key: PIG-1920
>                 URL: https://issues.apache.org/jira/browse/PIG-1920
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Richard Ding
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>
> Run following Pig script on trunk:
> {code}
> A = load 'input' as (v, u);
> B = group A by $0;
> C = group B by $0;
> describe C;
> R = foreach C generate B.A.v; 
> describe R;
> {code}
> One gets the this error:
> {code}
> C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Invalid field reference. Referenced field [v] does not exist in schema: A#19:bag{null#20:tuple(v#17:bytearray,u#18:bytearray)}.
> {code}
> Change the 5th line to 
> {code}
> R = foreach C generate B.A.$0; 
> {code}
> One gets this output:
> {code}
> C: {group: bytearray,B: {(group: bytearray,A: {(v: bytearray,u: bytearray)})}}
> R: {{(A: {(v: bytearray,u: bytearray)})}}
> {code}
> This is different (and wrong) from the corresponding Pig 0.8 output:
> {code}
> C: {group: bytearray,B: {group: bytearray,A: {v: bytearray,u: bytearray}}}
> R: {{v: bytearray}}
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira