You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Martin Goodson <ma...@qubitproducts.com> on 2012/10/12 17:48:24 UTC

How can I read Hive text files on S3 from Pig?

I am trying to load some text files in hive partitions on S3 using the
AllLoader function with no success. I get an error which indicates that
AllLoader is expecting the files to be on hdfs:

a = LOAD 's3n://xxxxx/yyyyy/zzz' using
org.apache.pig.piggybank.storage.AllLoader();
grunt> 2012-10-12 14:51:26,229 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
namenode.hadoop.companyname.com


reading the files with pig storage works fine but PigStorage is not aware
of the Hive partition structure so I cannot query the data using this
method (I have to specify the file manually):

a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();

Is there  a way of reading hive partitions from pig over S3?

hive-0.9.0
pig-0.10.0
hadoop-0.20


Thank you
Martin

Re: How can I read Hive text files on S3 from Pig?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

The same underlying class is used by PigStorage in 11, so we should
clean this up to make S3 users happy.

D

On Thu, Oct 18, 2012 at 5:22 AM, Martin Goodson
<ma...@qubitproducts.com> wrote:
> Sure - thanks for having a look. By the way,  I've moved to HCatalog and
> things look they are working.
> Thanks again
> Martin
>
> On 18 October 2012 05:15, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> Yeah that's a bug in FileLocalizer, apparently it assumes local or
>> hdfs, only. Could you file a jira?
>>
>> D
>>
>> On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
>> <ma...@qubitproducts.com> wrote:
>> > Hi Dmitriy,
>> > here's is the stack trace:
>> >
>> > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/,
>> expected:
>> > hdfs://namenode.adsf.companyname.com
>> >         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
>> >         at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
>> >         at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
>> >         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
>> >         at
>> >
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
>> >         at
>> >
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
>> >         at
>> >
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
>> >         at
>> > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
>> >         at
>> >
>> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
>> >         at
>> >
>> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
>> >         at
>> > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
>> >         at
>> >
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>> >         at
>> >
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>> >         at
>> >
>> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>> >         at
>> > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>> >         at
>> >
>> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>> >         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>> >         at
>> >
>> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>> >         at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>> >         at
>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>> >         at
>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>> >         at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>> >         at
>> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>> >         at
>> >
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>> >         at
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>> >         at
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>> >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>> >         at org.apache.pig.Main.run(Main.java:495)
>> >         at org.apache.pig.Main.main(Main.java:111)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >         at java.lang.reflect.Method.invoke(Method.java:601)
>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>> >
>> >
>> > Thanks for taking a look. I will start looking into HCatalog too.
>> >
>> > Martin
>> >
>> >
>> > On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>> >
>> >> Martin,
>> >> Do you have the compete stack trace?
>> >> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
>> >> but it's a 3rd party contrib and we don't really know it too well. I
>> >> can check out the error dump and see if there's anything obvious
>> >> though.
>> >>
>> >> D
>> >>
>> >> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
>> >> <ma...@qubitproducts.com> wrote:
>> >> > I am trying to load some text files in hive partitions on S3 using the
>> >> > AllLoader function with no success. I get an error which indicates
>> that
>> >> > AllLoader is expecting the files to be on hdfs:
>> >> >
>> >> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
>> >> > org.apache.pig.piggybank.storage.AllLoader();
>> >> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
>> >> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal
>> error.
>> >> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
>> >> > namenode.hadoop.companyname.com
>> >> >
>> >> >
>> >> > reading the files with pig storage works fine but PigStorage is not
>> aware
>> >> > of the Hive partition structure so I cannot query the data using this
>> >> > method (I have to specify the file manually):
>> >> >
>> >> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
>> >> >
>> >> > Is there  a way of reading hive partitions from pig over S3?
>> >> >
>> >> > hive-0.9.0
>> >> > pig-0.10.0
>> >> > hadoop-0.20
>> >> >
>> >> >
>> >> > Thank you
>> >> > Martin
>> >>
>>

Re: How can I read Hive text files on S3 from Pig?

Posted by Martin Goodson <ma...@qubitproducts.com>.

Sure - thanks for having a look. By the way,  I've moved to HCatalog and
things look they are working.
Thanks again
Martin

On 18 October 2012 05:15, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Yeah that's a bug in FileLocalizer, apparently it assumes local or
> hdfs, only. Could you file a jira?
>
> D
>
> On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
> <ma...@qubitproducts.com> wrote:
> > Hi Dmitriy,
> > here's is the stack trace:
> >
> > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/,
> expected:
> > hdfs://namenode.adsf.companyname.com
> >         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
> >         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
> >         at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
> >         at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
> >         at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
> >         at
> > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
> >         at
> >
> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
> >         at
> >
> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
> >         at
> > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
> >         at
> >
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
> >         at
> >
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
> >         at
> >
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> >         at
> > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
> >         at
> >
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> >         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> >         at
> >
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> >         at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
> >         at
> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> >         at
> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> >         at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> >         at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> >         at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> >         at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> >         at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> >         at org.apache.pig.Main.run(Main.java:495)
> >         at org.apache.pig.Main.main(Main.java:111)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:601)
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> >
> >
> > Thanks for taking a look. I will start looking into HCatalog too.
> >
> > Martin
> >
> >
> > On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> >
> >> Martin,
> >> Do you have the compete stack trace?
> >> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
> >> but it's a 3rd party contrib and we don't really know it too well. I
> >> can check out the error dump and see if there's anything obvious
> >> though.
> >>
> >> D
> >>
> >> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
> >> <ma...@qubitproducts.com> wrote:
> >> > I am trying to load some text files in hive partitions on S3 using the
> >> > AllLoader function with no success. I get an error which indicates
> that
> >> > AllLoader is expecting the files to be on hdfs:
> >> >
> >> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> >> > org.apache.pig.piggybank.storage.AllLoader();
> >> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
> >> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal
> error.
> >> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> >> > namenode.hadoop.companyname.com
> >> >
> >> >
> >> > reading the files with pig storage works fine but PigStorage is not
> aware
> >> > of the Hive partition structure so I cannot query the data using this
> >> > method (I have to specify the file manually):
> >> >
> >> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
> >> >
> >> > Is there  a way of reading hive partitions from pig over S3?
> >> >
> >> > hive-0.9.0
> >> > pig-0.10.0
> >> > hadoop-0.20
> >> >
> >> >
> >> > Thank you
> >> > Martin
> >>
>

Re: How can I read Hive text files on S3 from Pig?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Yeah that's a bug in FileLocalizer, apparently it assumes local or
hdfs, only. Could you file a jira?

D

On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
<ma...@qubitproducts.com> wrote:
> Hi Dmitriy,
> here's is the stack trace:
>
> java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected:
> hdfs://namenode.adsf.companyname.com
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
>         at
> org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
>         at
> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
>         at
> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
>         at
> org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>         at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>         at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>         at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>         at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>         at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>         at org.apache.pig.Main.run(Main.java:495)
>         at org.apache.pig.Main.main(Main.java:111)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
> Thanks for taking a look. I will start looking into HCatalog too.
>
> Martin
>
>
> On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> Martin,
>> Do you have the compete stack trace?
>> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
>> but it's a 3rd party contrib and we don't really know it too well. I
>> can check out the error dump and see if there's anything obvious
>> though.
>>
>> D
>>
>> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
>> <ma...@qubitproducts.com> wrote:
>> > I am trying to load some text files in hive partitions on S3 using the
>> > AllLoader function with no success. I get an error which indicates that
>> > AllLoader is expecting the files to be on hdfs:
>> >
>> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
>> > org.apache.pig.piggybank.storage.AllLoader();
>> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
>> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
>> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
>> > namenode.hadoop.companyname.com
>> >
>> >
>> > reading the files with pig storage works fine but PigStorage is not aware
>> > of the Hive partition structure so I cannot query the data using this
>> > method (I have to specify the file manually):
>> >
>> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
>> >
>> > Is there  a way of reading hive partitions from pig over S3?
>> >
>> > hive-0.9.0
>> > pig-0.10.0
>> > hadoop-0.20
>> >
>> >
>> > Thank you
>> > Martin
>>

Re: How can I read Hive text files on S3 from Pig?

Posted by Martin Goodson <ma...@qubitproducts.com>.

Hi Dmitriy,
here's is the stack trace:

java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected:
hdfs://namenode.adsf.companyname.com
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
        at
org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
        at
org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
        at
org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
        at
org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
        at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
        at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
        at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
        at
org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
        at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
        at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
        at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:495)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)


Thanks for taking a look. I will start looking into HCatalog too.

Martin


On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Martin,
> Do you have the compete stack trace?
> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
> but it's a 3rd party contrib and we don't really know it too well. I
> can check out the error dump and see if there's anything obvious
> though.
>
> D
>
> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
> <ma...@qubitproducts.com> wrote:
> > I am trying to load some text files in hive partitions on S3 using the
> > AllLoader function with no success. I get an error which indicates that
> > AllLoader is expecting the files to be on hdfs:
> >
> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> > org.apache.pig.piggybank.storage.AllLoader();
> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> > namenode.hadoop.companyname.com
> >
> >
> > reading the files with pig storage works fine but PigStorage is not aware
> > of the Hive partition structure so I cannot query the data using this
> > method (I have to specify the file manually):
> >
> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
> >
> > Is there  a way of reading hive partitions from pig over S3?
> >
> > hive-0.9.0
> > pig-0.10.0
> > hadoop-0.20
> >
> >
> > Thank you
> > Martin
>

Re: How can I read Hive text files on S3 from Pig?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Martin,
Do you have the compete stack trace?
Generally, for Hive interop I recommend HCatalog; AllLoader is neat
but it's a 3rd party contrib and we don't really know it too well. I
can check out the error dump and see if there's anything obvious
though.

D

On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
<ma...@qubitproducts.com> wrote:
> I am trying to load some text files in hive partitions on S3 using the
> AllLoader function with no success. I get an error which indicates that
> AllLoader is expecting the files to be on hdfs:
>
> a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> org.apache.pig.piggybank.storage.AllLoader();
> grunt> 2012-10-12 14:51:26,229 [main] ERROR
> org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
> Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> namenode.hadoop.companyname.com
>
>
> reading the files with pig storage works fine but PigStorage is not aware
> of the Hive partition structure so I cannot query the data using this
> method (I have to specify the file manually):
>
> a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
>
> Is there  a way of reading hive partitions from pig over S3?
>
> hive-0.9.0
> pig-0.10.0
> hadoop-0.20
>
>
> Thank you
> Martin