You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Billy Watson <wi...@gmail.com> on 2015/04/20 15:54:21 UTC

Unable to Find S3N Filesystem Pig 0.14 on Hadoop 2.6

I sent the same message to the hadoop mailing list b/c I'm not sure where
the problem lies. I'm pretty sure it's the hadoop client, but the hadoop
peeps may say it's b/c of a misconfiguration within pig, so JIC:

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line
without issue. I have set some options in hadoop-env.sh to make sure all
the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing,
BTW and not enough searchable documentation on changes to the s3 stuff in
hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
fail in pig, but rather fails in mapreduce with "Error:
java.io.IOException: No FileSystem for scheme: s3n.”

I have added [hadoop-install-loc]/lib and
[hadoop-install-loc]/share/hadoop/tools/lib/
to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do
this, the pig job will fail at 0% (before it ever gets to mapreduce) with a
very similar “No fileystem for scheme s3n” error.

I feel like at this point I just have to add the share/hadoop/tools/lib
directory (and maybe lib) to the right environment variable, but I can’t
figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.
lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at
org.apache.pig.piggybank.storage.CSVExcelStorage.
setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.
executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
at org.apache.pig.backend.hadoop.executionengine.
mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(
UserGroupInformation.java:1628) at org.apache.hadoop.mapred.
YarnChild.main(YarnChild.java:158)


William Watson
Software Engineer
(904) 705-7056 PCS

Re: Unable to Find S3N Filesystem Pig 0.14 on Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:54 AM, Billy Watson <wi...@gmail.com>
wrote:

> I sent the same message to the hadoop mailing list b/c I'm not sure where
> the problem lies. I'm pretty sure it's the hadoop client, but the hadoop
> peeps may say it's b/c of a misconfiguration within pig, so JIC:
>
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
> line without issue. I have set some options in hadoop-env.sh to make sure
> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
> confusing, BTW and not enough searchable documentation on changes to the s3
> stuff in hadoop 2.6 IMHO).
>
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
> fail in pig, but rather fails in mapreduce with "Error:
> java.io.IOException: No FileSystem for scheme: s3n.”
>
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/
> to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do
> this, the pig job will fail at 0% (before it ever gets to mapreduce) with a
> very similar “No fileystem for scheme s3n” error.
>
> I feel like at this point I just have to add the share/hadoop/tools/lib
> directory (and maybe lib) to the right environment variable, but I can’t
> figure out which environment variable that should be.
>
> I appreciate any help, thanks!!
>
>
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.
> lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at
> org.apache.pig.piggybank.storage.CSVExcelStorage.
> setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.
> executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
> at org.apache.pig.backend.hadoop.executionengine.
> mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1628) at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:158)
>
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>