You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Andrew Musselman <an...@gmail.com> on 2015/05/08 00:41:33 UTC

Parquet

I'm trying to read a parquet file in Pig, using parquet-mr jars built from
master.  Should I be building from a release tag?

Pig version is binary 0.14.

grunt> register
/home/akm/parquet-mr/parquet-*/target/parquet-*-1.8.0-SNAPSHOT.jar;
grunt> a = load '/home/akm/record.parquet' using
org.apache.parquet.pig.ParquetLoader;
2015-05-07 15:39:41,860 [main] INFO
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2015-05-07 15:39:41,878 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2218: Invalid resource schema: bag schema must have tuple as its field
Details at logfile: /home/akm/pig_1431036955635.log

And in that logfile:

Pig Stack Trace
---------------
ERROR 2218: Invalid resource schema: bag schema must have tuple as its field

Failed to parse: Can not retrieve schema from loader
org.apache.parquet.pig.ParquetLoader@1be72d8
        at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:558)
        at org.apache.pig.Main.main(Main.java:170)
Caused by: java.lang.RuntimeException: Can not retrieve schema from loader
org.apache.parquet.pig.ParquetLoader@1be72d8
        at
org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:91)
        at
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
        at
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
        at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
        at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
        at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
        at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
        at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
        ... 10 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245:
Cannot get schema from loadFunc org.apache.parquet.pig.ParquetLoader
        at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:179)
        at
org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
        ... 17 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2218:
Invalid resource schema: bag schema must have tuple as its field
        at
org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:216)
        at
org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1916)
        at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:176)
        ... 18 more
================================================================================

Re: Parquet

Posted by Andrew Musselman <an...@gmail.com>.
I also just noticed the field names in these files were written with
whitespace in them, and some with raw strings describing data types, like
"(INT_16)", in case that makes a difference.

On Thu, May 7, 2015 at 3:46 PM, Andrew Musselman <andrew.musselman@gmail.com
> wrote:

> Here's the schema of that record.parquet file(edited for brevity):
>
> $ ~/hadoop-2.6.0/bin/hadoop jar
> ~/parquet-mr/parquet-tools/target/parquet-tools-1.8.0-SNAPSHOT.jar schema
> ~/record.parquet
> message some/message/tokens path/identifying/data/location {
>   repeated double a;
>   repeated double b;
>   repeated float c;
>   ...
>   repeated double z;
> }
>
>
> On Thu, May 7, 2015 at 3:41 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> I'm trying to read a parquet file in Pig, using parquet-mr jars built
>> from master.  Should I be building from a release tag?
>>
>> Pig version is binary 0.14.
>>
>> grunt> register
>> /home/akm/parquet-mr/parquet-*/target/parquet-*-1.8.0-SNAPSHOT.jar;
>> grunt> a = load '/home/akm/record.parquet' using
>> org.apache.parquet.pig.ParquetLoader;
>> 2015-05-07 15:39:41,860 [main] INFO
>>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
>> to process : 1
>> 2015-05-07 15:39:41,878 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 2218: Invalid resource schema: bag schema must have tuple as its field
>> Details at logfile: /home/akm/pig_1431036955635.log
>>
>> And in that logfile:
>>
>> Pig Stack Trace
>> ---------------
>> ERROR 2218: Invalid resource schema: bag schema must have tuple as its
>> field
>>
>> Failed to parse: Can not retrieve schema from loader
>> org.apache.parquet.pig.ParquetLoader@1be72d8
>>         at
>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
>>         at
>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
>>         at
>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
>>         at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
>>         at
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
>>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
>>         at org.apache.pig.Main.run(Main.java:558)
>>         at org.apache.pig.Main.main(Main.java:170)
>> Caused by: java.lang.RuntimeException: Can not retrieve schema from
>> loader org.apache.parquet.pig.ParquetLoader@1be72d8
>>         at
>> org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:91)
>>         at
>> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
>>         at
>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
>>         ... 10 more
>> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
>> 2245: Cannot get schema from loadFunc org.apache.parquet.pig.ParquetLoader
>>         at
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:179)
>>         at
>> org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
>>         ... 17 more
>> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
>> 2218: Invalid resource schema: bag schema must have tuple as its field
>>         at
>> org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:216)
>>         at
>> org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1916)
>>         at
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:176)
>>         ... 18 more
>>
>> ================================================================================
>>
>>
>

Re: Parquet

Posted by Andrew Musselman <an...@gmail.com>.
Here's the schema of that record.parquet file(edited for brevity):

$ ~/hadoop-2.6.0/bin/hadoop jar
~/parquet-mr/parquet-tools/target/parquet-tools-1.8.0-SNAPSHOT.jar schema
~/record.parquet
message some/message/tokens path/identifying/data/location {
  repeated double a;
  repeated double b;
  repeated float c;
  ...
  repeated double z;
}


On Thu, May 7, 2015 at 3:41 PM, Andrew Musselman <andrew.musselman@gmail.com
> wrote:

> I'm trying to read a parquet file in Pig, using parquet-mr jars built from
> master.  Should I be building from a release tag?
>
> Pig version is binary 0.14.
>
> grunt> register
> /home/akm/parquet-mr/parquet-*/target/parquet-*-1.8.0-SNAPSHOT.jar;
> grunt> a = load '/home/akm/record.parquet' using
> org.apache.parquet.pig.ParquetLoader;
> 2015-05-07 15:39:41,860 [main] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2015-05-07 15:39:41,878 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2218: Invalid resource schema: bag schema must have tuple as its field
> Details at logfile: /home/akm/pig_1431036955635.log
>
> And in that logfile:
>
> Pig Stack Trace
> ---------------
> ERROR 2218: Invalid resource schema: bag schema must have tuple as its
> field
>
> Failed to parse: Can not retrieve schema from loader
> org.apache.parquet.pig.ParquetLoader@1be72d8
>         at
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
>         at
> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
>         at
> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
>         at org.apache.pig.Main.run(Main.java:558)
>         at org.apache.pig.Main.main(Main.java:170)
> Caused by: java.lang.RuntimeException: Can not retrieve schema from loader
> org.apache.parquet.pig.ParquetLoader@1be72d8
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:91)
>         at
> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
>         at
> org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
>         at
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
>         at
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
>         at
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
>         at
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
>         at
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
>         ... 10 more
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245:
> Cannot get schema from loadFunc org.apache.parquet.pig.ParquetLoader
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:179)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
>         ... 17 more
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2218:
> Invalid resource schema: bag schema must have tuple as its field
>         at
> org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:216)
>         at
> org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1916)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:176)
>         ... 18 more
>
> ================================================================================
>
>