You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/06/22 03:57:23 UTC

How do you load data from S3 on Amazon EMR with Pig 0.10.0?

My script is simple:

/* Avro */
register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/avro-1.5.3.jar
register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/json-simple-1.1.jar
register /home/hadoop/pig-0.10.0/contrib/piggybank/java/piggybank.jar
register
/home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
register
/home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

emails = LOAD 's3://rjurney_public_web/hadoop/enron.avro' using
AvroStorage();


The error confuses me. Why can't I load data from s3?

2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error. Invalid hostname in URI
s3://rjurney_public_web/hadoop/enron.avro
2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
java.lang.IllegalArgumentException: Invalid hostname in URI
s3://rjurney_public_web/hadoop/enron.avro
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:436)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1327)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1345)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:466)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: How do you load data from S3 on Amazon EMR with Pig 0.10.0?

Posted by Russell Jurney <ru...@gmail.com>.
Oh, it is https://issues.apache.org/jira/browse/PIG-2539

On Thu, Jun 21, 2012 at 6:59 PM, Russell Jurney <ru...@gmail.com>wrote:

> cd s3://elasticmapreduce/samples/pig-apache/input/
>
> 2012-06-22 01:58:56,685 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2999: Unexpected internal error. This file system object (hdfs://
> 10.4.115.51:9000) does not support access to the request path
> 's3://elasticmapreduce/samples/pig-apache/input' You possibly called
> FileSystem.get(conf) when you should have called FileSystem.get(uri, conf)
> to obtain a file system supporting your path.
>
> Wait a minute... we fixed this.  I fixed this.  Why isn't it in Pig 0.10?
>
> On Thu, Jun 21, 2012 at 6:57 PM, Russell Jurney <ru...@gmail.com>wrote:
>
>> My script is simple:
>>
>> /* Avro */
>> register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/avro-1.5.3.jar
>> register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/json-simple-1.1.jar
>> register /home/hadoop/pig-0.10.0/contrib/piggybank/java/piggybank.jar
>> register
>> /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
>> register
>> /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>>
>> define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>> emails = LOAD 's3://rjurney_public_web/hadoop/enron.avro' using
>> AvroStorage();
>>
>>
>> The error confuses me. Why can't I load data from s3?
>>
>> 2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 2999: Unexpected internal error. Invalid hostname in URI
>> s3://rjurney_public_web/hadoop/enron.avro
>> 2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> java.lang.IllegalArgumentException: Invalid hostname in URI
>> s3://rjurney_public_web/hadoop/enron.avro
>> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
>>  at
>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:436)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1327)
>>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65)
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1345)
>>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
>> at
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
>>  at
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
>> at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
>>  at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
>> at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:466)
>>  at
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
>> at
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
>>  at
>> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>> at
>> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
>>  at
>> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>  at
>> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
>>  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
>> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>>  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
>> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>>  at
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>> at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>>  at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>  at org.apache.pig.Main.run(Main.java:490)
>> at org.apache.pig.Main.main(Main.java:111)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> --
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
>> com
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: How do you load data from S3 on Amazon EMR with Pig 0.10.0?

Posted by Russell Jurney <ru...@gmail.com>.
cd s3://elasticmapreduce/samples/pig-apache/input/

2012-06-22 01:58:56,685 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error. This file system object (hdfs://
10.4.115.51:9000) does not support access to the request path
's3://elasticmapreduce/samples/pig-apache/input' You possibly called
FileSystem.get(conf) when you should have called FileSystem.get(uri, conf)
to obtain a file system supporting your path.

Wait a minute... we fixed this.  I fixed this.  Why isn't it in Pig 0.10?

On Thu, Jun 21, 2012 at 6:57 PM, Russell Jurney <ru...@gmail.com>wrote:

> My script is simple:
>
> /* Avro */
> register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/avro-1.5.3.jar
> register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/json-simple-1.1.jar
> register /home/hadoop/pig-0.10.0/contrib/piggybank/java/piggybank.jar
> register
> /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
> register
> /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>
> define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> emails = LOAD 's3://rjurney_public_web/hadoop/enron.avro' using
> AvroStorage();
>
>
> The error confuses me. Why can't I load data from s3?
>
> 2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2999: Unexpected internal error. Invalid hostname in URI
> s3://rjurney_public_web/hadoop/enron.avro
> 2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> java.lang.IllegalArgumentException: Invalid hostname in URI
> s3://rjurney_public_web/hadoop/enron.avro
> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
>  at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:436)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1327)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1345)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
>  at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
> at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
>  at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
> at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:466)
>  at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
>  at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
>  at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>  at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
>  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>  at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>  at org.apache.pig.Main.run(Main.java:490)
> at org.apache.pig.Main.main(Main.java:111)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com