You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Martin Goodson <ma...@qubitproducts.com> on 2012/10/12 17:48:24 UTC
How can I read Hive text files on S3 from Pig?
I am trying to load some text files in hive partitions on S3 using the
AllLoader function with no success. I get an error which indicates that
AllLoader is expecting the files to be on hdfs:
a = LOAD 's3n://xxxxx/yyyyy/zzz' using
org.apache.pig.piggybank.storage.AllLoader();
grunt> 2012-10-12 14:51:26,229 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
namenode.hadoop.companyname.com
reading the files with pig storage works fine but PigStorage is not aware
of the Hive partition structure so I cannot query the data using this
method (I have to specify the file manually):
a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
Is there a way of reading hive partitions from pig over S3?
hive-0.9.0
pig-0.10.0
hadoop-0.20
Thank you
Martin
Re: How can I read Hive text files on S3 from Pig?
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
The same underlying class is used by PigStorage in 11, so we should
clean this up to make S3 users happy.
D
On Thu, Oct 18, 2012 at 5:22 AM, Martin Goodson
<ma...@qubitproducts.com> wrote:
> Sure - thanks for having a look. By the way, I've moved to HCatalog and
> things look they are working.
> Thanks again
> Martin
>
> On 18 October 2012 05:15, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> Yeah that's a bug in FileLocalizer, apparently it assumes local or
>> hdfs, only. Could you file a jira?
>>
>> D
>>
>> On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
>> <ma...@qubitproducts.com> wrote:
>> > Hi Dmitriy,
>> > here's is the stack trace:
>> >
>> > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/,
>> expected:
>> > hdfs://namenode.adsf.companyname.com
>> > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
>> > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
>> > at
>> >
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
>> > at
>> >
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
>> > at
>> >
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
>> > at
>> > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
>> > at
>> >
>> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
>> > at
>> >
>> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
>> > at
>> > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
>> > at
>> >
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>> > at
>> >
>> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>> > at
>> >
>> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>> > at
>> > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>> > at
>> >
>> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>> > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>> > at
>> >
>> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>> > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>> > at
>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>> > at
>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>> > at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>> > at
>> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>> > at
>> >
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>> > at
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>> > at
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>> > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>> > at org.apache.pig.Main.run(Main.java:495)
>> > at org.apache.pig.Main.main(Main.java:111)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:601)
>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>> >
>> >
>> > Thanks for taking a look. I will start looking into HCatalog too.
>> >
>> > Martin
>> >
>> >
>> > On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>> >
>> >> Martin,
>> >> Do you have the compete stack trace?
>> >> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
>> >> but it's a 3rd party contrib and we don't really know it too well. I
>> >> can check out the error dump and see if there's anything obvious
>> >> though.
>> >>
>> >> D
>> >>
>> >> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
>> >> <ma...@qubitproducts.com> wrote:
>> >> > I am trying to load some text files in hive partitions on S3 using the
>> >> > AllLoader function with no success. I get an error which indicates
>> that
>> >> > AllLoader is expecting the files to be on hdfs:
>> >> >
>> >> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
>> >> > org.apache.pig.piggybank.storage.AllLoader();
>> >> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
>> >> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal
>> error.
>> >> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
>> >> > namenode.hadoop.companyname.com
>> >> >
>> >> >
>> >> > reading the files with pig storage works fine but PigStorage is not
>> aware
>> >> > of the Hive partition structure so I cannot query the data using this
>> >> > method (I have to specify the file manually):
>> >> >
>> >> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
>> >> >
>> >> > Is there a way of reading hive partitions from pig over S3?
>> >> >
>> >> > hive-0.9.0
>> >> > pig-0.10.0
>> >> > hadoop-0.20
>> >> >
>> >> >
>> >> > Thank you
>> >> > Martin
>> >>
>>
Re: How can I read Hive text files on S3 from Pig?
Posted by Martin Goodson <ma...@qubitproducts.com>.
Sure - thanks for having a look. By the way, I've moved to HCatalog and
things look they are working.
Thanks again
Martin
On 18 October 2012 05:15, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> Yeah that's a bug in FileLocalizer, apparently it assumes local or
> hdfs, only. Could you file a jira?
>
> D
>
> On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
> <ma...@qubitproducts.com> wrote:
> > Hi Dmitriy,
> > here's is the stack trace:
> >
> > java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/,
> expected:
> > hdfs://namenode.adsf.companyname.com
> > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
> > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
> > at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
> > at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
> > at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
> > at
> > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
> > at
> >
> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
> > at
> >
> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
> > at
> > org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
> > at
> >
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
> > at
> >
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
> > at
> >
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> > at
> > org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
> > at
> >
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> > at
> >
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
> > at
> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> > at
> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> > at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> > at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> > at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> > at org.apache.pig.Main.run(Main.java:495)
> > at org.apache.pig.Main.main(Main.java:111)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:601)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> >
> >
> > Thanks for taking a look. I will start looking into HCatalog too.
> >
> > Martin
> >
> >
> > On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> >
> >> Martin,
> >> Do you have the compete stack trace?
> >> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
> >> but it's a 3rd party contrib and we don't really know it too well. I
> >> can check out the error dump and see if there's anything obvious
> >> though.
> >>
> >> D
> >>
> >> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
> >> <ma...@qubitproducts.com> wrote:
> >> > I am trying to load some text files in hive partitions on S3 using the
> >> > AllLoader function with no success. I get an error which indicates
> that
> >> > AllLoader is expecting the files to be on hdfs:
> >> >
> >> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> >> > org.apache.pig.piggybank.storage.AllLoader();
> >> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
> >> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal
> error.
> >> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> >> > namenode.hadoop.companyname.com
> >> >
> >> >
> >> > reading the files with pig storage works fine but PigStorage is not
> aware
> >> > of the Hive partition structure so I cannot query the data using this
> >> > method (I have to specify the file manually):
> >> >
> >> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
> >> >
> >> > Is there a way of reading hive partitions from pig over S3?
> >> >
> >> > hive-0.9.0
> >> > pig-0.10.0
> >> > hadoop-0.20
> >> >
> >> >
> >> > Thank you
> >> > Martin
> >>
>
Re: How can I read Hive text files on S3 from Pig?
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Yeah that's a bug in FileLocalizer, apparently it assumes local or
hdfs, only. Could you file a jira?
D
On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
<ma...@qubitproducts.com> wrote:
> Hi Dmitriy,
> here's is the stack trace:
>
> java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected:
> hdfs://namenode.adsf.companyname.com
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
> at
> org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
> at
> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
> at
> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
> at
> org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
> at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:495)
> at org.apache.pig.Main.main(Main.java:111)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
> Thanks for taking a look. I will start looking into HCatalog too.
>
> Martin
>
>
> On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> Martin,
>> Do you have the compete stack trace?
>> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
>> but it's a 3rd party contrib and we don't really know it too well. I
>> can check out the error dump and see if there's anything obvious
>> though.
>>
>> D
>>
>> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
>> <ma...@qubitproducts.com> wrote:
>> > I am trying to load some text files in hive partitions on S3 using the
>> > AllLoader function with no success. I get an error which indicates that
>> > AllLoader is expecting the files to be on hdfs:
>> >
>> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
>> > org.apache.pig.piggybank.storage.AllLoader();
>> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
>> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
>> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
>> > namenode.hadoop.companyname.com
>> >
>> >
>> > reading the files with pig storage works fine but PigStorage is not aware
>> > of the Hive partition structure so I cannot query the data using this
>> > method (I have to specify the file manually):
>> >
>> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
>> >
>> > Is there a way of reading hive partitions from pig over S3?
>> >
>> > hive-0.9.0
>> > pig-0.10.0
>> > hadoop-0.20
>> >
>> >
>> > Thank you
>> > Martin
>>
Re: How can I read Hive text files on S3 from Pig?
Posted by Martin Goodson <ma...@qubitproducts.com>.
Hi Dmitriy,
here's is the stack trace:
java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected:
hdfs://namenode.adsf.companyname.com
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
at
org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
at
org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
at
org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
at
org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at
org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Thanks for taking a look. I will start looking into HCatalog too.
Martin
On 12 October 2012 18:56, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> Martin,
> Do you have the compete stack trace?
> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
> but it's a 3rd party contrib and we don't really know it too well. I
> can check out the error dump and see if there's anything obvious
> though.
>
> D
>
> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
> <ma...@qubitproducts.com> wrote:
> > I am trying to load some text files in hive partitions on S3 using the
> > AllLoader function with no success. I get an error which indicates that
> > AllLoader is expecting the files to be on hdfs:
> >
> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> > org.apache.pig.piggybank.storage.AllLoader();
> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> > namenode.hadoop.companyname.com
> >
> >
> > reading the files with pig storage works fine but PigStorage is not aware
> > of the Hive partition structure so I cannot query the data using this
> > method (I have to specify the file manually):
> >
> > a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
> >
> > Is there a way of reading hive partitions from pig over S3?
> >
> > hive-0.9.0
> > pig-0.10.0
> > hadoop-0.20
> >
> >
> > Thank you
> > Martin
>
Re: How can I read Hive text files on S3 from Pig?
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Martin,
Do you have the compete stack trace?
Generally, for Hive interop I recommend HCatalog; AllLoader is neat
but it's a 3rd party contrib and we don't really know it too well. I
can check out the error dump and see if there's anything obvious
though.
D
On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
<ma...@qubitproducts.com> wrote:
> I am trying to load some text files in hive partitions on S3 using the
> AllLoader function with no success. I get an error which indicates that
> AllLoader is expecting the files to be on hdfs:
>
> a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> org.apache.pig.piggybank.storage.AllLoader();
> grunt> 2012-10-12 14:51:26,229 [main] ERROR
> org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
> Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> namenode.hadoop.companyname.com
>
>
> reading the files with pig storage works fine but PigStorage is not aware
> of the Hive partition structure so I cannot query the data using this
> method (I have to specify the file manually):
>
> a = LOAD 's3n://xxxxx/yyyyy/zzzZ' using PigStorage();
>
> Is there a way of reading hive partitions from pig over S3?
>
> hive-0.9.0
> pig-0.10.0
> hadoop-0.20
>
>
> Thank you
> Martin