You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ankur (JIRA)" <ji...@apache.org> on 2009/11/26 09:00:40 UTC

[jira] Created: (PIG-1112) FLATTEN eliminates the alias

FLATTEN eliminates the alias
----------------------------

                 Key: PIG-1112
                 URL: https://issues.apache.org/jira/browse/PIG-1112
             Project: Pig
          Issue Type: Bug
            Reporter: Ankur
             Fix For: 0.6.0


If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
Consider the following example:-

A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
C = GROUP B by (first,third);

This throws the error
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1112) FLATTEN eliminates the alias

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-1112:
-------------------------------

    Assignee: Alan Gates  (was: Daniel Dai)

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1112) FLATTEN eliminates the alias

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1112:
--------------------------------

    Fix Version/s: 0.9.0

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1112) FLATTEN eliminates the alias

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928889#action_12928889 ] 

Alan Gates commented on PIG-1112:
---------------------------------

Daniel, I don't understand the choice here.  I think we agreed that if the user specifies (third, second) as the schema then we take that to mean there are two bytearray fields and we project them to guarantee this.  So

{code}
B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;
{code}

will now be equivalent to
{code}
Bprime = FOREACH A GENERATE first,FLATTEN(ladder);
B = FOREACH Bprime GENERATE first, $1 as third, $2 as second;
{code}

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1112) FLATTEN eliminates the alias

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784330#action_12784330 ] 

Pradeep Kamath commented on PIG-1112:
-------------------------------------

Pig doesn't handle partial schemas well - the fix for this issue will depend on how we want to treat unknown schemas. I did verify that this works when the schema specified is complete:

{code}
A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{t:tuple(x:int)});
B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;
C = GROUP B by (first,third);
describe C;
{code}

Here's the output:
C: {group: (first: chararray,third: int),B: {first: chararray,third: int,second: chararray}}

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1112) FLATTEN eliminates the alias

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912624#action_12912624 ] 

Alan Gates commented on PIG-1112:
---------------------------------

In the example above, the user specified that he expects two fields to come out of the flatten of ladder.  This seems equivalent to saying A = load 'ladder' as (third, second).  So I propose that when users give field names (and possibly types) in an AS that is attached to a flatten Pig takes that to be the schema of the flattened data.

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1112) FLATTEN eliminates the alias

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1112:
--------------------------------

    Fix Version/s:     (was: 0.6.0)

Moving out of 0.6.0 release. The right way to run this query is to specify the complete schema for the bag. We are not sure how we should be dealing with partial schemas and need to figure out the overall strategy before fixing individual issues.

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Daniel Dai
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1112) FLATTEN eliminates the alias

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928837#action_12928837 ] 

Daniel Dai commented on PIG-1112:
---------------------------------

In current trunk, the schema for B becomes:
B: {first: chararray,third: bytearray,second: chararray}

The alias for FLATTEN(ladder) is right, but we need to decide whether to mandate the type for "third" as bytearray, or the entire schema for B is unknown.

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1112) FLATTEN eliminates the alias

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1112:
-----------------------------------

    Assignee: Daniel Dai

> FLATTEN eliminates the alias
> ----------------------------
>
>                 Key: PIG-1112
>                 URL: https://issues.apache.org/jira/browse/PIG-1112
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, ladder:bag{});              
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;                                   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.