You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Klüber, Ralf <Ra...@p3-group.com> on 2014/08/06 15:26:34 UTC

Creating and Reading Avro in Amazon EMR

Hello,

I am trying to

(i)                  read avro files in pig on Amazon EMR which I have created in my local cluster from JSONs (complex nested including arrays) and uploaded to S3

(ii)                Create avro files in EMR from those complex JSONs uploaded to S3

In my local Cloudera cluster I was able to load and work with the data in the avro file.

I was not able to load the existing avro files in Amazon EMR.

My EMR cluster is
´´
AMI version:3.0.4
Amazon 2.2.0
Hive 0.11.0.2,
Pig 0.11.1.1
Impala 1.2.1
´´

I searched a lot, but I could not find too much about EMR/Avro. I am stuck. Is there somewhere an example with data, schemas and pig scripts which I can try?

I hope this - as my 1st post in this mailing list - complies to your standards in terms of provided information and tone ,-). If not, apologies and let me try a 2nd time.

In pig I try this
´´
REGISTER s3://p3insight/libs/avro-1.7.4.jar;
-- REGISTER s3://p3insight/libs/pig/piggybank.jar;
REGISTER s3://p3insight/libs/jackson-mapper-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/jackson-core-2.3.4.jar
-- REGISTER s3://p3insight/libs/jackson-core-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/json-simple-1.1.1.jar;
REGISTER /home/hadoop/pig/lib/piggybank.jar

a = LOAD 's3://p3iqubole/data/avro/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
´´

Output is as follows
´´
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
Details at logfile: /mnt/var/log/apps/pig.log
´´

Content of log file is:
´´
Pig Stack Trace
---------------
ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'

Failed to parse: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1571)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1544)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:542)
        at org.apache.pig.Main.main(Main.java:159)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
        at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3235)
        at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1314)
        at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:798)
        at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:516)
        at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:391)
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
        ... 15 more
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:618)
        at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:823)
        ... 21 more
Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
        at java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
        at java.lang.Class.getConstructor0(Class.java:2803)
        at java.lang.Class.newInstance(Class.java:345)
        at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:588)
        ... 22 more
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 27 more
´´



Kind regards.
Ralf Klüber


Re: Creating and Reading Avro in Amazon EMR

Posted by Michael Pigott <mp...@gmail.com>.
Hi Ralf,
    I'm sorry for not responding sooner; I have not used Avro for Pig
before so I did not know how helpful I would be.

That said, at the bottom of the stack trace is the error: *Caused by:
java.lang.ClassNotFoundException: org.json.simple.parser.ParseException*

Is that dependency set up correctly in your Amazon cluster?

Good luck!
Mike


On Thu, Aug 21, 2014 at 11:11 AM, Klüber, Ralf <Ra...@p3-group.com>
wrote:

>  Hello,
>
>
>
> I hope I did not step on anyones foot with this mail. Have not received
> any feedback. Can someone give me a hint were to search or a hint into the
> right direction. Thanks in advance.
>
>  Kind regards.
>
> Ralf
>
>
>
> *Von:* Klüber, Ralf [mailto:Ralf.Klueber@p3-group.com]
> *Gesendet:* Wednesday, August 06, 2014 3:27 PM
> *An:* user@avro.apache.org
> *Betreff:* Creating and Reading Avro in Amazon EMR
>
>
>
> Hello,
>
>
>
> I am trying to
>
> (i)                  read avro files in pig on Amazon EMR which I have
> created in my local cluster from JSONs (complex nested including arrays)
> and uploaded to S3
>
> (ii)                Create avro files in EMR from those complex JSONs
> uploaded to S3
>
>
>
> In my local Cloudera cluster I was able to load and work with the data in
> the avro file.
>
>
>
> I was not able to load the existing avro files in Amazon EMR.
>
>
>
> My EMR cluster is
>
> ´´
>
> AMI version:3.0.4
>
> Amazon 2.2.0
>
> Hive 0.11.0.2,
>
> Pig 0.11.1.1
>
> Impala 1.2.1
>
> ´´
>
>
>
> I searched a lot, but I could not find too much about EMR/Avro. I am
> stuck. Is there somewhere an example with data, schemas and pig scripts
> which I can try?
>
>
>
> I hope this – as my 1st post in this mailing list – complies to your
> standards in terms of provided information and tone ,-). If not, apologies
> and let me try a 2nd time.
>
>
>
> In pig I try this
>
> ´´
>
> REGISTER s3://p3insight/libs/avro-1.7.4.jar;
>
> -- REGISTER s3://p3insight/libs/pig/piggybank.jar;
>
> REGISTER s3://p3insight/libs/jackson-mapper-asl-1.9.9.jar;
>
> REGISTER s3://p3insight/libs/jackson-core-2.3.4.jar
>
> -- REGISTER s3://p3insight/libs/jackson-core-asl-1.9.9.jar;
>
> REGISTER s3://p3insight/libs/json-simple-1.1.1.jar;
>
> REGISTER /home/hadoop/pig/lib/piggybank.jar
>
>
>
> a = LOAD 's3://p3iqubole/data/avro/' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> ´´
>
>
>
> Output is as follows
>
> ´´
>
> <line 1, column 4> pig script failed to validate:
> java.lang.RuntimeException: could not instantiate
> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
>
> Details at logfile: /mnt/var/log/apps/pig.log
>
> ´´
>
>
>
> Content of log file is:
>
> ´´
>
> Pig Stack Trace
>
> ---------------
>
> ERROR 1200: Pig script failed to parse:
>
> <line 1, column 4> pig script failed to validate:
> java.lang.RuntimeException: could not instantiate
> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
>
>
>
> Failed to parse: Pig script failed to parse:
>
> <line 1, column 4> pig script failed to validate:
> java.lang.RuntimeException: could not instantiate
> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
>
>         at
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
>
>         at
> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1571)
>
>         at
> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1544)
>
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
>
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
>
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
>
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>
>         at org.apache.pig.Main.run(Main.java:542)
>
>         at org.apache.pig.Main.main(Main.java:159)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:606)
>
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
> Caused by:
>
> <line 1, column 4> pig script failed to validate:
> java.lang.RuntimeException: could not instantiate
> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
>
>         at
> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
>
>         at
> org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3235)
>
>         at
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1314)
>
>         at
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:798)
>
>         at
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:516)
>
>         at
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:391)
>
>         at
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
>
>         ... 15 more
>
> Caused by: java.lang.RuntimeException: could not instantiate
> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
>
>         at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:618)
>
>         at
> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:823)
>
>         ... 21 more
>
> Caused by: java.lang.NoClassDefFoundError:
> org/json/simple/parser/ParseException
>
>         at java.lang.Class.getDeclaredConstructors0(Native Method)
>
>         at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
>
>         at java.lang.Class.getConstructor0(Class.java:2803)
>
>         at java.lang.Class.newInstance(Class.java:345)
>
>         at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:588)
>
>         ... 22 more
>
> Caused by: java.lang.ClassNotFoundException:
> org.json.simple.parser.ParseException
>
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>
>         ... 27 more
>
> ´´
>
>
>
>
>
>
>
> Kind regards.
>
> Ralf Klüber
>
>
>

AW: Creating and Reading Avro in Amazon EMR

Posted by Klüber, Ralf <Ra...@p3-group.com>.
Hello,

I hope I did not step on anyones foot with this mail. Have not received any feedback. Can someone give me a hint were to search or a hint into the right direction. Thanks in advance.
Kind regards.
Ralf

Von: Klüber, Ralf [mailto:Ralf.Klueber@p3-group.com]
Gesendet: Wednesday, August 06, 2014 3:27 PM
An: user@avro.apache.org
Betreff: Creating and Reading Avro in Amazon EMR

Hello,

I am trying to

(i)                  read avro files in pig on Amazon EMR which I have created in my local cluster from JSONs (complex nested including arrays) and uploaded to S3

(ii)                Create avro files in EMR from those complex JSONs uploaded to S3

In my local Cloudera cluster I was able to load and work with the data in the avro file.

I was not able to load the existing avro files in Amazon EMR.

My EMR cluster is
´´
AMI version:3.0.4
Amazon 2.2.0
Hive 0.11.0.2,
Pig 0.11.1.1
Impala 1.2.1
´´

I searched a lot, but I could not find too much about EMR/Avro. I am stuck. Is there somewhere an example with data, schemas and pig scripts which I can try?

I hope this - as my 1st post in this mailing list - complies to your standards in terms of provided information and tone ,-). If not, apologies and let me try a 2nd time.

In pig I try this
´´
REGISTER s3://p3insight/libs/avro-1.7.4.jar;
-- REGISTER s3://p3insight/libs/pig/piggybank.jar;
REGISTER s3://p3insight/libs/jackson-mapper-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/jackson-core-2.3.4.jar
-- REGISTER s3://p3insight/libs/jackson-core-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/json-simple-1.1.1.jar;
REGISTER /home/hadoop/pig/lib/piggybank.jar

a = LOAD 's3://p3iqubole/data/avro/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
´´

Output is as follows
´´
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
Details at logfile: /mnt/var/log/apps/pig.log
´´

Content of log file is:
´´
Pig Stack Trace
---------------
ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'

Failed to parse: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1571)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1544)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:542)
        at org.apache.pig.Main.main(Main.java:159)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
        at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3235)
        at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1314)
        at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:798)
        at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:516)
        at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:391)
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
        ... 15 more
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:618)
        at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:823)
        ... 21 more
Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
        at java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
        at java.lang.Class.getConstructor0(Class.java:2803)
        at java.lang.Class.newInstance(Class.java:345)
        at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:588)
        ... 22 more
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 27 more
´´



Kind regards.
Ralf Klüber