You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Brandon Williams (Created) (JIRA)" <ji...@apache.org> on 2012/01/20 22:18:42 UTC

[jira] [Created] (PIG-2485) Unable to find alias in a bag with nested schema

Unable to find alias in a bag with nested schema
------------------------------------------------

                 Key: PIG-2485
                 URL: https://issues.apache.org/jira/browse/PIG-2485
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.2, 0.10
            Reporter: Brandon Williams


I've created a loadfunc that implements LoadMetadata and returns a schema as follows:

{noformat}
(key: bytearray,columns: {((name: chararray,owner_id: chararray))})
{noformat}

(the code is at CASSANDRA-3371 if you want to take a look)

However, whenever I try to access tuple fields within the bag, they cannot be found:

{noformat}
rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
one = filter rows by columns.owner_id eq 'foo';
dump one;
{format}
Produces:
{noformat}
2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}

Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2485) Unable to find alias in a bag with nested schema

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196620#comment-13196620 ] 

Dmitriy V. Ryaboy commented on PIG-2485:
----------------------------------------

Pretty sure this is caused by your tuple double-wrapping. Schema should look like this:

(key: bytearray,columns: {(name: chararray,owner_id: chararray)})
                
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
>                 Key: PIG-2485
>                 URL: https://issues.apache.org/jira/browse/PIG-2485
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
>         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2485) Unable to find alias in a bag with nested schema

Posted by "Brandon Williams (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved PIG-2485.
-----------------------------------

    Resolution: Invalid

Closing this, since Dmitriy explained that a bag can only contain tuples with a single schema.  This is unfortunate since Cassandra can return columns with different schemas within its rows, but I'll work out a solution that doesn't need more than one schema for a bag.
                
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
>                 Key: PIG-2485
>                 URL: https://issues.apache.org/jira/browse/PIG-2485
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
>         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2485) Unable to find alias in a bag with nested schema

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196626#comment-13196626 ] 

Brandon Williams commented on PIG-2485:
---------------------------------------

The problem is, my schema can contain many tuples (with different schema in each), and if I define the bag as containing more than one ResourceSchema.validateSchema throws an InvalidSchemaException since it checks that a bag only has one subfield.

As a more realistic example, my schema might look like:

(key: bytearray,columns: {(name: chararray,owner_id: chararray), (name: chararray,item_count: integer), (name: chararray,score: float)})
                
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
>                 Key: PIG-2485
>                 URL: https://issues.apache.org/jira/browse/PIG-2485
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
>         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2485) Unable to find alias in a bag with nested schema

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated PIG-2485:
----------------------------------

    Description: 
I've created a loadfunc that implements LoadMetadata and returns a schema as follows:

{noformat}
(key: bytearray,columns: {((name: chararray,owner_id: chararray))})
{noformat}

(the code is at CASSANDRA-3371 if you want to take a look)

However, whenever I try to access tuple fields within the bag, they cannot be found:

{noformat}
rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
one = filter rows by columns.owner_id eq 'foo';
dump one;
{noformat}
Produces:
{noformat}
2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}

Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
{noformat}


  was:
I've created a loadfunc that implements LoadMetadata and returns a schema as follows:

{noformat}
(key: bytearray,columns: {((name: chararray,owner_id: chararray))})
{noformat}

(the code is at CASSANDRA-3371 if you want to take a look)

However, whenever I try to access tuple fields within the bag, they cannot be found:

{noformat}
rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
one = filter rows by columns.owner_id eq 'foo';
dump one;
{format}
Produces:
{noformat}
2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}

Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
{noformat}


    
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
>                 Key: PIG-2485
>                 URL: https://issues.apache.org/jira/browse/PIG-2485
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible.  I've tried this against the 0.9 and 0.10 branch heads with no luck.  Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
>         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira