You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Brandon Williams (Created) (JIRA)" <ji...@apache.org> on 2012/01/20 22:18:42 UTC
[jira] [Created] (PIG-2485) Unable to find alias in a bag with
nested schema
Unable to find alias in a bag with nested schema
------------------------------------------------
Key: PIG-2485
URL: https://issues.apache.org/jira/browse/PIG-2485
Project: Pig
Issue Type: Bug
Affects Versions: 0.9.2, 0.10
Reporter: Brandon Williams
I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
{noformat}
(key: bytearray,columns: {((name: chararray,owner_id: chararray))})
{noformat}
(the code is at CASSANDRA-3371 if you want to take a look)
However, whenever I try to access tuple fields within the bag, they cannot be found:
{noformat}
rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
one = filter rows by columns.owner_id eq 'foo';
dump one;
{format}
Produces:
{noformat}
2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}
Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2485) Unable to find alias in a bag with
nested schema
Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196620#comment-13196620 ]
Dmitriy V. Ryaboy commented on PIG-2485:
----------------------------------------
Pretty sure this is caused by your tuple double-wrapping. Schema should look like this:
(key: bytearray,columns: {(name: chararray,owner_id: chararray)})
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
> Key: PIG-2485
> URL: https://issues.apache.org/jira/browse/PIG-2485
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2485) Unable to find alias in a bag with
nested schema
Posted by "Brandon Williams (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams resolved PIG-2485.
-----------------------------------
Resolution: Invalid
Closing this, since Dmitriy explained that a bag can only contain tuples with a single schema. This is unfortunate since Cassandra can return columns with different schemas within its rows, but I'll work out a solution that doesn't need more than one schema for a bag.
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
> Key: PIG-2485
> URL: https://issues.apache.org/jira/browse/PIG-2485
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2485) Unable to find alias in a bag with
nested schema
Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196626#comment-13196626 ]
Brandon Williams commented on PIG-2485:
---------------------------------------
The problem is, my schema can contain many tuples (with different schema in each), and if I define the bag as containing more than one ResourceSchema.validateSchema throws an InvalidSchemaException since it checks that a bag only has one subfield.
As a more realistic example, my schema might look like:
(key: bytearray,columns: {(name: chararray,owner_id: chararray), (name: chararray,item_count: integer), (name: chararray,score: float)})
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
> Key: PIG-2485
> URL: https://issues.apache.org/jira/browse/PIG-2485
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2485) Unable to find alias in a bag with
nested schema
Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated PIG-2485:
----------------------------------
Description:
I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
{noformat}
(key: bytearray,columns: {((name: chararray,owner_id: chararray))})
{noformat}
(the code is at CASSANDRA-3371 if you want to take a look)
However, whenever I try to access tuple fields within the bag, they cannot be found:
{noformat}
rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
one = filter rows by columns.owner_id eq 'foo';
dump one;
{noformat}
Produces:
{noformat}
2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}
Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
{noformat}
was:
I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
{noformat}
(key: bytearray,columns: {((name: chararray,owner_id: chararray))})
{noformat}
(the code is at CASSANDRA-3371 if you want to take a look)
However, whenever I try to access tuple fields within the bag, they cannot be found:
{noformat}
rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
one = filter rows by columns.owner_id eq 'foo';
dump one;
{format}
Produces:
{noformat}
2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}
Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
<file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
{noformat}
> Unable to find alias in a bag with nested schema
> ------------------------------------------------
>
> Key: PIG-2485
> URL: https://issues.apache.org/jira/browse/PIG-2485
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Brandon Williams
>
> I've created a loadfunc that implements LoadMetadata and returns a schema as follows:
> {noformat}
> (key: bytearray,columns: {((name: chararray,owner_id: chararray))})
> {noformat}
> (the code is at CASSANDRA-3371 if you want to take a look)
> However, whenever I try to access tuple fields within the bag, they cannot be found:
> {noformat}
> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
> one = filter rows by columns.owner_id eq 'foo';
> dump one;
> {noformat}
> Produces:
> {noformat}
> 2012-01-20 20:12:14,858 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
> {noformat}
> Replacing the bag with another tuple works and all the fields are accessible. I've tried this against the 0.9 and 0.10 branch heads with no luck. Trunk produces a slight different error:
> {noformat}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Pig script failed to parse:
> <file foo.pig, line 2, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1127: Index 1 out of range in schema::tuple(name:chararray,column_family:chararray)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira