You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (JIRA)" <ji...@apache.org> on 2011/02/24 07:43:38 UTC

[jira] Created: (PIG-1868) New logical plan fails when I have complex data types from udf

New logical plan fails when I have complex data types from udf
--------------------------------------------------------------

                 Key: PIG-1868
                 URL: https://issues.apache.org/jira/browse/PIG-1868
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.8.0
            Reporter: Vivek Padmanabhan


The new logical plan fails when I have complex data types returning from my eval function.

The below is my script :

{code}
register myudf.jar;   
B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
B2 = group B1 by id;
B = foreach B2 {
 Tuples = order B1 by ts;
 generate Tuples;
};
C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
C2 = foreach C1 generate FLATTEN(seq);
C3 = foreach C2 generate  current.id as id;
dump C3;
{code}

On C3 it fails with below message :
{code}
Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
{code}

The below is the describe on C1 ;
{code}
C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
{code}

The script works if I turn off new logical plan or use Pig 0.7.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011387#comment-13011387 ] 

Thejas M Nair commented on PIG-1868:
------------------------------------

+1


> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>         Attachments: PIG-1868-1.patch
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1868:
--------------------------------

    Fix Version/s: 0.8.0
         Assignee: Daniel Dai

> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006760#comment-13006760 ] 

Daniel Dai commented on PIG-1868:
---------------------------------

Get actually implmenentation of TransformToMyDataType from the user. The problem is TransformToMyDataType forget to set twolevelaccess flag for the bag. Pig 0.9 does not require twolevelaccess flag. However, we saw Pig 0.9 fail with a different stack:

ERROR 1000: Invalid field reference. Referenced field [guid] does not exist in schema: null.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias C3
at org.apache.pig.PigServer.explain(PigServer.java:993)
at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:368)
at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:300)
at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:263)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:665)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:325)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:537)
at org.apache.pig.Main.main(Main.java:108)
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 1000: Invalid field reference. Referenced field [guid] does not exist in schema: null.
at org.apache.pig.newplan.logical.visitor.ColumnAliasConversionVisitor$1.visit(ColumnAliasConversionVisitor.java:114)
at org.apache.pig.newplan.logical.expression.DereferenceExpression.accept(DereferenceExpression.java:83)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:114)
at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:104)
at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1538)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1533)
at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1295)
at org.apache.pig.PigServer.buildStorePlan(PigServer.java:1195)
at org.apache.pig.PigServer.explain(PigServer.java:956)


> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1868:
----------------------------

    Affects Version/s:     (was: 0.8.0)
        Fix Version/s:     (was: 0.8.0)

> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>         Attachments: PIG-1868-1.patch
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1868:
----------------------------

    Affects Version/s: 0.9.0
        Fix Version/s: 0.9.0

> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>         Attachments: PIG-1868-1.patch
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1868:
----------------------------

    Attachment: PIG-1868-1.patch

> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>         Attachments: PIG-1868-1.patch
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007663#comment-13007663 ] 

Daniel Dai commented on PIG-1868:
---------------------------------

The reason for the failure on trunk is because in this query, TransformToMyDataType returns more specific error message than user declared schema "seq: { t: ( previous, current, next ) }", and now we only take user declared schema. This is a regression, I will attach a patch to fix it.

> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-1868) New logical plan fails when I have complex data types from udf

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-1868.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Review notes: https://reviews.apache.org/r/526/

Patch committed to trunk.

> New logical plan fails when I have complex data types from udf
> --------------------------------------------------------------
>
>                 Key: PIG-1868
>                 URL: https://issues.apache.org/jira/browse/PIG-1868
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>         Attachments: PIG-1868-1.patch
>
>
> The new logical plan fails when I have complex data types returning from my eval function.
> The below is my script :
> {code}
> register myudf.jar;   
> B1 = load 'myinput' as (id:chararray,ts:int,url:chararray);
> B2 = group B1 by id;
> B = foreach B2 {
>  Tuples = order B1 by ts;
>  generate Tuples;
> };
> C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) };
> C2 = foreach C1 generate FLATTEN(seq);
> C3 = foreach C2 generate  current.id as id;
> dump C3;
> {code}
> On C3 it fails with below message :
> {code}
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1)
> {code}
> The below is the describe on C1 ;
> {code}
> C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}}
> {code}
> The script works if I turn off new logical plan or use Pig 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira