You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Billy Watson <wi...@gmail.com> on 2015/04/20 15:17:06 UTC

Unable to Find S3N Filesystem Hadoop 2.6

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line
without issue. I have set some options in hadoop-env.sh to make sure all
the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing,
BTW and not enough searchable documentation on changes to the s3 stuff in
hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
fail in pig, but rather fails in mapreduce with "Error:
java.io.IOException: No FileSystem for scheme: s3n.”

I have added [hadoop-install-loc]/lib and
[hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
at 0% (before it ever gets to mapreduce) with a very similar “No fileystem
for scheme s3n” error.

I feel like at this point I just have to add the share/hadoop/tools/lib
directory (and maybe lib) to the right environment variable, but I can’t
figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
at
org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


— Billy Watson

-- 
William Watson
Software Engineer
(904) 705-7056 PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

I agree with Sato's statement that the service loader mechanism should be able to find the S3N file system classes via the service loader metadata embedded in hadoop-aws.jar.  I expect setting fs.s3n.impl wouldn't be required.  Billy, if you find otherwise in your testing, please let us know.  That might be a bug.

We do still have a feature gap related to AbstractFileSystem (a newer implementation of the Hadoop file system interface accessed by clients through the FileContext class).  In that case, we do not yet support the service loader mechanism, and configuration would be required.  HADOOP-11527 tracks development of the service loader mechanism for AbstractFileSystem.

https://issues.apache.org/jira/browse/HADOOP-11527

Billy, no worries on being busy.  We all understand that the day job takes precedence.  :-)  If you do feel like proposing a documentation patch based on your experiences, then please feel free to attach it to HADOOP-11863.  The community certainly would appreciate it.  The contribution process is documented here:

https://wiki.apache.org/hadoop/HowToContribute

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Wednesday, April 22, 2015 at 6:05 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

Sato,

Also, we did see a different error entirely when we didn't set the fs.s3n.impl, but I can try removing that property in development now that we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO. This was a big change and that certainly could have changed, but if you're looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>> wrote:
Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been having in scrambling to upgrade our cluster that I forgot to file a bug. I certainly complained aloud that the docs were insufficient, but I didn't do anything to help the community so thanks a bunch for recognizing that and helping me out!

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com>> wrote:
Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in previous versions.

Take a look at FileSystem#loadFileSystem, which is called from FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2 distribution.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!

Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

I agree with Sato's statement that the service loader mechanism should be able to find the S3N file system classes via the service loader metadata embedded in hadoop-aws.jar.  I expect setting fs.s3n.impl wouldn't be required.  Billy, if you find otherwise in your testing, please let us know.  That might be a bug.

We do still have a feature gap related to AbstractFileSystem (a newer implementation of the Hadoop file system interface accessed by clients through the FileContext class).  In that case, we do not yet support the service loader mechanism, and configuration would be required.  HADOOP-11527 tracks development of the service loader mechanism for AbstractFileSystem.

https://issues.apache.org/jira/browse/HADOOP-11527

Billy, no worries on being busy.  We all understand that the day job takes precedence.  :-)  If you do feel like proposing a documentation patch based on your experiences, then please feel free to attach it to HADOOP-11863.  The community certainly would appreciate it.  The contribution process is documented here:

https://wiki.apache.org/hadoop/HowToContribute

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Wednesday, April 22, 2015 at 6:05 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

Sato,

Also, we did see a different error entirely when we didn't set the fs.s3n.impl, but I can try removing that property in development now that we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO. This was a big change and that certainly could have changed, but if you're looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>> wrote:
Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been having in scrambling to upgrade our cluster that I forgot to file a bug. I certainly complained aloud that the docs were insufficient, but I didn't do anything to help the community so thanks a bunch for recognizing that and helping me out!

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com>> wrote:
Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in previous versions.

Take a look at FileSystem#loadFileSystem, which is called from FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2 distribution.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!

Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

I agree with Sato's statement that the service loader mechanism should be able to find the S3N file system classes via the service loader metadata embedded in hadoop-aws.jar.  I expect setting fs.s3n.impl wouldn't be required.  Billy, if you find otherwise in your testing, please let us know.  That might be a bug.

We do still have a feature gap related to AbstractFileSystem (a newer implementation of the Hadoop file system interface accessed by clients through the FileContext class).  In that case, we do not yet support the service loader mechanism, and configuration would be required.  HADOOP-11527 tracks development of the service loader mechanism for AbstractFileSystem.

https://issues.apache.org/jira/browse/HADOOP-11527

Billy, no worries on being busy.  We all understand that the day job takes precedence.  :-)  If you do feel like proposing a documentation patch based on your experiences, then please feel free to attach it to HADOOP-11863.  The community certainly would appreciate it.  The contribution process is documented here:

https://wiki.apache.org/hadoop/HowToContribute

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Wednesday, April 22, 2015 at 6:05 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

Sato,

Also, we did see a different error entirely when we didn't set the fs.s3n.impl, but I can try removing that property in development now that we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO. This was a big change and that certainly could have changed, but if you're looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>> wrote:
Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been having in scrambling to upgrade our cluster that I forgot to file a bug. I certainly complained aloud that the docs were insufficient, but I didn't do anything to help the community so thanks a bunch for recognizing that and helping me out!

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com>> wrote:
Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in previous versions.

Take a look at FileSystem#loadFileSystem, which is called from FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2 distribution.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!

Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

I agree with Sato's statement that the service loader mechanism should be able to find the S3N file system classes via the service loader metadata embedded in hadoop-aws.jar.  I expect setting fs.s3n.impl wouldn't be required.  Billy, if you find otherwise in your testing, please let us know.  That might be a bug.

We do still have a feature gap related to AbstractFileSystem (a newer implementation of the Hadoop file system interface accessed by clients through the FileContext class).  In that case, we do not yet support the service loader mechanism, and configuration would be required.  HADOOP-11527 tracks development of the service loader mechanism for AbstractFileSystem.

https://issues.apache.org/jira/browse/HADOOP-11527

Billy, no worries on being busy.  We all understand that the day job takes precedence.  :-)  If you do feel like proposing a documentation patch based on your experiences, then please feel free to attach it to HADOOP-11863.  The community certainly would appreciate it.  The contribution process is documented here:

https://wiki.apache.org/hadoop/HowToContribute

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Wednesday, April 22, 2015 at 6:05 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

Sato,

Also, we did see a different error entirely when we didn't set the fs.s3n.impl, but I can try removing that property in development now that we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO. This was a big change and that certainly could have changed, but if you're looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>> wrote:
Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been having in scrambling to upgrade our cluster that I forgot to file a bug. I certainly complained aloud that the docs were insufficient, but I didn't do anything to help the community so thanks a bunch for recognizing that and helping me out!

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com>> wrote:
Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in previous versions.

Take a look at FileSystem#loadFileSystem, which is called from FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2 distribution.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>
        <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>> wrote:
Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/

From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!

Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Sato,

Also, we did see a different error entirely when we didn't set the
fs.s3n.impl, but I can try removing that property in development now that
we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO.
This was a big change and that certainly could have changed, but if you're
looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>
wrote:

> Chris and Sato,
>
> Thanks a bunch! I've been so swamped by these and other issues we've been
> having in scrambling to upgrade our cluster that I forgot to file a bug. I
> certainly complained aloud that the docs were insufficient, but I didn't do
> anything to help the community so thanks a bunch for recognizing that and
> helping me out!
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:
>
>> Hi Billy, Chris,
>>
>> Let me share a couple of my findings.
>>
>> I believe this was introduced by HADOOP-10893,
>> which was introduced from 2.6.0(HDP2.2).
>>
>> 1. fs.s3n.impl
>>
>> > We added a property to the core-site.xml file:
>>
>> You don't need to explicitly set this. It has never been done so in
>> previous versions.
>>
>> Take a look at FileSystem#loadFileSystem, which is called from
>> FileSystem#getFileSystemClass.
>> Subclasses of FileSystem are loaded automatically if they are available
>> on a classloader you care.
>>
>> So you just need to make sure hadoop-aws.jar is on a classpath.
>>
>> For file system shell, this is done in hadoop-env.sh,
>> while for a MR job, in mapreduce.application.classpath,
>> or for YARN, in yarn.application.classpath.
>>
>> 2. mapreduce.application.classpath
>>
>> > And updated the classpath for mapreduce applications:
>>
>> Note that it points to a distributed cache on the default HDP 2.2
>> distribution.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>>     </property>
>> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
>> hadoop-aws.jar(S3NFileSystem)
>>
>> While on a vanilla hadoop, it looks like standard paths as yours.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>>     </property>
>>
>> Thanks,
>> Sato
>>
>> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>>  Hello Billy,
>>>
>>>  I think your experience indicates that our documentation is
>>> insufficient for discussing how to configure and use the alternative file
>>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>>
>>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>>
>>>  Please feel free to watch that issue if you'd like to be informed as
>>> it makes progress.  Thank you for reporting back to the thread after you
>>> had a solution.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Billy Watson <wi...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Monday, April 20, 2015 at 11:14 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>>
>>>   We found the correct configs.
>>>
>>>  This post was helpful, but didn't entirely work for us out of the box
>>> since we are using hadoop-pseudo-distributed.
>>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>>
>>>  We added a property to the core-site.xml file:
>>>
>>>    <property>
>>>     <name>fs.s3n.impl</name>
>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>     <description>Tell hadoop which class to use to access s3 URLs. This
>>> change became necessary in hadoop 2.6.0</description>
>>>   </property>
>>>
>>>  And updated the classpath for mapreduce applications:
>>>
>>>    <property>
>>>     <name>mapreduce.application.classpath</name>
>>>
>>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>>     <description>The classpath specifically for mapreduce jobs. This
>>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>>   </property>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com
>>> > wrote:
>>>
>>>> Thanks, anyways. Anyone else run into this issue?
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>>> cluster with Amazon met
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>>> environment. We use HDP in staging and production and we discovered these
>>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  One thing I think which i most likely missed completely is are you
>>>>>> using an amazon EMR cluster or something in house?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>>
>>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>>
>>>>>> Essentially the problem boils down to:
>>>>>>
>>>>>> - need to access s3n URLs
>>>>>> - cannot access without including the tools directory
>>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>>> happening later in job
>>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>>
>>>>>>>  you mention an environmental variable. the step before you specify
>>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Regards,
>>>>>>> Jonathan Aquilina
>>>>>>> Founder Eagle Eye T
>>>>>>>
>>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>>
>>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%,
>>>>>>> does not fail in pig, but rather fails in mapreduce with "Error:
>>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>>
>>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>>> for scheme s3n" error.
>>>>>>>
>>>>>>> I feel like at this point I just have to add the
>>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>>
>>>>>>> I appreciate any help, thanks!!
>>>>>>>
>>>>>>>
>>>>>>> Stack trace:
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>>
>>>>>>>
>>>>>>> — Billy Watson
>>>>>>>
>>>>>>> --
>>>>>>>  William Watson
>>>>>>> Software Engineer
>>>>>>> (904) 705-7056 PCS
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Sato,

Also, we did see a different error entirely when we didn't set the
fs.s3n.impl, but I can try removing that property in development now that
we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO.
This was a big change and that certainly could have changed, but if you're
looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>
wrote:

> Chris and Sato,
>
> Thanks a bunch! I've been so swamped by these and other issues we've been
> having in scrambling to upgrade our cluster that I forgot to file a bug. I
> certainly complained aloud that the docs were insufficient, but I didn't do
> anything to help the community so thanks a bunch for recognizing that and
> helping me out!
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:
>
>> Hi Billy, Chris,
>>
>> Let me share a couple of my findings.
>>
>> I believe this was introduced by HADOOP-10893,
>> which was introduced from 2.6.0(HDP2.2).
>>
>> 1. fs.s3n.impl
>>
>> > We added a property to the core-site.xml file:
>>
>> You don't need to explicitly set this. It has never been done so in
>> previous versions.
>>
>> Take a look at FileSystem#loadFileSystem, which is called from
>> FileSystem#getFileSystemClass.
>> Subclasses of FileSystem are loaded automatically if they are available
>> on a classloader you care.
>>
>> So you just need to make sure hadoop-aws.jar is on a classpath.
>>
>> For file system shell, this is done in hadoop-env.sh,
>> while for a MR job, in mapreduce.application.classpath,
>> or for YARN, in yarn.application.classpath.
>>
>> 2. mapreduce.application.classpath
>>
>> > And updated the classpath for mapreduce applications:
>>
>> Note that it points to a distributed cache on the default HDP 2.2
>> distribution.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>>     </property>
>> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
>> hadoop-aws.jar(S3NFileSystem)
>>
>> While on a vanilla hadoop, it looks like standard paths as yours.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>>     </property>
>>
>> Thanks,
>> Sato
>>
>> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>>  Hello Billy,
>>>
>>>  I think your experience indicates that our documentation is
>>> insufficient for discussing how to configure and use the alternative file
>>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>>
>>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>>
>>>  Please feel free to watch that issue if you'd like to be informed as
>>> it makes progress.  Thank you for reporting back to the thread after you
>>> had a solution.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Billy Watson <wi...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Monday, April 20, 2015 at 11:14 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>>
>>>   We found the correct configs.
>>>
>>>  This post was helpful, but didn't entirely work for us out of the box
>>> since we are using hadoop-pseudo-distributed.
>>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>>
>>>  We added a property to the core-site.xml file:
>>>
>>>    <property>
>>>     <name>fs.s3n.impl</name>
>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>     <description>Tell hadoop which class to use to access s3 URLs. This
>>> change became necessary in hadoop 2.6.0</description>
>>>   </property>
>>>
>>>  And updated the classpath for mapreduce applications:
>>>
>>>    <property>
>>>     <name>mapreduce.application.classpath</name>
>>>
>>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>>     <description>The classpath specifically for mapreduce jobs. This
>>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>>   </property>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com
>>> > wrote:
>>>
>>>> Thanks, anyways. Anyone else run into this issue?
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>>> cluster with Amazon met
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>>> environment. We use HDP in staging and production and we discovered these
>>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  One thing I think which i most likely missed completely is are you
>>>>>> using an amazon EMR cluster or something in house?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>>
>>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>>
>>>>>> Essentially the problem boils down to:
>>>>>>
>>>>>> - need to access s3n URLs
>>>>>> - cannot access without including the tools directory
>>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>>> happening later in job
>>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>>
>>>>>>>  you mention an environmental variable. the step before you specify
>>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Regards,
>>>>>>> Jonathan Aquilina
>>>>>>> Founder Eagle Eye T
>>>>>>>
>>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>>
>>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%,
>>>>>>> does not fail in pig, but rather fails in mapreduce with "Error:
>>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>>
>>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>>> for scheme s3n" error.
>>>>>>>
>>>>>>> I feel like at this point I just have to add the
>>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>>
>>>>>>> I appreciate any help, thanks!!
>>>>>>>
>>>>>>>
>>>>>>> Stack trace:
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>>
>>>>>>>
>>>>>>> — Billy Watson
>>>>>>>
>>>>>>> --
>>>>>>>  William Watson
>>>>>>> Software Engineer
>>>>>>> (904) 705-7056 PCS
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Sato,

Also, we did see a different error entirely when we didn't set the
fs.s3n.impl, but I can try removing that property in development now that
we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO.
This was a big change and that certainly could have changed, but if you're
looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>
wrote:

> Chris and Sato,
>
> Thanks a bunch! I've been so swamped by these and other issues we've been
> having in scrambling to upgrade our cluster that I forgot to file a bug. I
> certainly complained aloud that the docs were insufficient, but I didn't do
> anything to help the community so thanks a bunch for recognizing that and
> helping me out!
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:
>
>> Hi Billy, Chris,
>>
>> Let me share a couple of my findings.
>>
>> I believe this was introduced by HADOOP-10893,
>> which was introduced from 2.6.0(HDP2.2).
>>
>> 1. fs.s3n.impl
>>
>> > We added a property to the core-site.xml file:
>>
>> You don't need to explicitly set this. It has never been done so in
>> previous versions.
>>
>> Take a look at FileSystem#loadFileSystem, which is called from
>> FileSystem#getFileSystemClass.
>> Subclasses of FileSystem are loaded automatically if they are available
>> on a classloader you care.
>>
>> So you just need to make sure hadoop-aws.jar is on a classpath.
>>
>> For file system shell, this is done in hadoop-env.sh,
>> while for a MR job, in mapreduce.application.classpath,
>> or for YARN, in yarn.application.classpath.
>>
>> 2. mapreduce.application.classpath
>>
>> > And updated the classpath for mapreduce applications:
>>
>> Note that it points to a distributed cache on the default HDP 2.2
>> distribution.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>>     </property>
>> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
>> hadoop-aws.jar(S3NFileSystem)
>>
>> While on a vanilla hadoop, it looks like standard paths as yours.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>>     </property>
>>
>> Thanks,
>> Sato
>>
>> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>>  Hello Billy,
>>>
>>>  I think your experience indicates that our documentation is
>>> insufficient for discussing how to configure and use the alternative file
>>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>>
>>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>>
>>>  Please feel free to watch that issue if you'd like to be informed as
>>> it makes progress.  Thank you for reporting back to the thread after you
>>> had a solution.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Billy Watson <wi...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Monday, April 20, 2015 at 11:14 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>>
>>>   We found the correct configs.
>>>
>>>  This post was helpful, but didn't entirely work for us out of the box
>>> since we are using hadoop-pseudo-distributed.
>>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>>
>>>  We added a property to the core-site.xml file:
>>>
>>>    <property>
>>>     <name>fs.s3n.impl</name>
>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>     <description>Tell hadoop which class to use to access s3 URLs. This
>>> change became necessary in hadoop 2.6.0</description>
>>>   </property>
>>>
>>>  And updated the classpath for mapreduce applications:
>>>
>>>    <property>
>>>     <name>mapreduce.application.classpath</name>
>>>
>>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>>     <description>The classpath specifically for mapreduce jobs. This
>>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>>   </property>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com
>>> > wrote:
>>>
>>>> Thanks, anyways. Anyone else run into this issue?
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>>> cluster with Amazon met
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>>> environment. We use HDP in staging and production and we discovered these
>>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  One thing I think which i most likely missed completely is are you
>>>>>> using an amazon EMR cluster or something in house?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>>
>>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>>
>>>>>> Essentially the problem boils down to:
>>>>>>
>>>>>> - need to access s3n URLs
>>>>>> - cannot access without including the tools directory
>>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>>> happening later in job
>>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>>
>>>>>>>  you mention an environmental variable. the step before you specify
>>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Regards,
>>>>>>> Jonathan Aquilina
>>>>>>> Founder Eagle Eye T
>>>>>>>
>>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>>
>>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%,
>>>>>>> does not fail in pig, but rather fails in mapreduce with "Error:
>>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>>
>>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>>> for scheme s3n" error.
>>>>>>>
>>>>>>> I feel like at this point I just have to add the
>>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>>
>>>>>>> I appreciate any help, thanks!!
>>>>>>>
>>>>>>>
>>>>>>> Stack trace:
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>>
>>>>>>>
>>>>>>> — Billy Watson
>>>>>>>
>>>>>>> --
>>>>>>>  William Watson
>>>>>>> Software Engineer
>>>>>>> (904) 705-7056 PCS
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Sato,

Also, we did see a different error entirely when we didn't set the
fs.s3n.impl, but I can try removing that property in development now that
we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO.
This was a big change and that certainly could have changed, but if you're
looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <wi...@gmail.com>
wrote:

> Chris and Sato,
>
> Thanks a bunch! I've been so swamped by these and other issues we've been
> having in scrambling to upgrade our cluster that I forgot to file a bug. I
> certainly complained aloud that the docs were insufficient, but I didn't do
> anything to help the community so thanks a bunch for recognizing that and
> helping me out!
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:
>
>> Hi Billy, Chris,
>>
>> Let me share a couple of my findings.
>>
>> I believe this was introduced by HADOOP-10893,
>> which was introduced from 2.6.0(HDP2.2).
>>
>> 1. fs.s3n.impl
>>
>> > We added a property to the core-site.xml file:
>>
>> You don't need to explicitly set this. It has never been done so in
>> previous versions.
>>
>> Take a look at FileSystem#loadFileSystem, which is called from
>> FileSystem#getFileSystemClass.
>> Subclasses of FileSystem are loaded automatically if they are available
>> on a classloader you care.
>>
>> So you just need to make sure hadoop-aws.jar is on a classpath.
>>
>> For file system shell, this is done in hadoop-env.sh,
>> while for a MR job, in mapreduce.application.classpath,
>> or for YARN, in yarn.application.classpath.
>>
>> 2. mapreduce.application.classpath
>>
>> > And updated the classpath for mapreduce applications:
>>
>> Note that it points to a distributed cache on the default HDP 2.2
>> distribution.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>>     </property>
>> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
>> hadoop-aws.jar(S3NFileSystem)
>>
>> While on a vanilla hadoop, it looks like standard paths as yours.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>>     </property>
>>
>> Thanks,
>> Sato
>>
>> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>>  Hello Billy,
>>>
>>>  I think your experience indicates that our documentation is
>>> insufficient for discussing how to configure and use the alternative file
>>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>>
>>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>>
>>>  Please feel free to watch that issue if you'd like to be informed as
>>> it makes progress.  Thank you for reporting back to the thread after you
>>> had a solution.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Billy Watson <wi...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Date: Monday, April 20, 2015 at 11:14 AM
>>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>>
>>>   We found the correct configs.
>>>
>>>  This post was helpful, but didn't entirely work for us out of the box
>>> since we are using hadoop-pseudo-distributed.
>>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>>
>>>  We added a property to the core-site.xml file:
>>>
>>>    <property>
>>>     <name>fs.s3n.impl</name>
>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>     <description>Tell hadoop which class to use to access s3 URLs. This
>>> change became necessary in hadoop 2.6.0</description>
>>>   </property>
>>>
>>>  And updated the classpath for mapreduce applications:
>>>
>>>    <property>
>>>     <name>mapreduce.application.classpath</name>
>>>
>>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>>     <description>The classpath specifically for mapreduce jobs. This
>>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>>   </property>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com
>>> > wrote:
>>>
>>>> Thanks, anyways. Anyone else run into this issue?
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>>> cluster with Amazon met
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>>> environment. We use HDP in staging and production and we discovered these
>>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  One thing I think which i most likely missed completely is are you
>>>>>> using an amazon EMR cluster or something in house?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>>
>>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>>
>>>>>> Essentially the problem boils down to:
>>>>>>
>>>>>> - need to access s3n URLs
>>>>>> - cannot access without including the tools directory
>>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>>> happening later in job
>>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>>
>>>>>>>  you mention an environmental variable. the step before you specify
>>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Regards,
>>>>>>> Jonathan Aquilina
>>>>>>> Founder Eagle Eye T
>>>>>>>
>>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>>
>>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%,
>>>>>>> does not fail in pig, but rather fails in mapreduce with "Error:
>>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>>
>>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>>> for scheme s3n" error.
>>>>>>>
>>>>>>> I feel like at this point I just have to add the
>>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>>
>>>>>>> I appreciate any help, thanks!!
>>>>>>>
>>>>>>>
>>>>>>> Stack trace:
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>>
>>>>>>>
>>>>>>> — Billy Watson
>>>>>>>
>>>>>>> --
>>>>>>>  William Watson
>>>>>>> Software Engineer
>>>>>>> (904) 705-7056 PCS
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been
having in scrambling to upgrade our cluster that I forgot to file a bug. I
certainly complained aloud that the docs were insufficient, but I didn't do
anything to help the community so thanks a bunch for recognizing that and
helping me out!

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Billy, Chris,
>
> Let me share a couple of my findings.
>
> I believe this was introduced by HADOOP-10893,
> which was introduced from 2.6.0(HDP2.2).
>
> 1. fs.s3n.impl
>
> > We added a property to the core-site.xml file:
>
> You don't need to explicitly set this. It has never been done so in
> previous versions.
>
> Take a look at FileSystem#loadFileSystem, which is called from
> FileSystem#getFileSystemClass.
> Subclasses of FileSystem are loaded automatically if they are available on
> a classloader you care.
>
> So you just need to make sure hadoop-aws.jar is on a classpath.
>
> For file system shell, this is done in hadoop-env.sh,
> while for a MR job, in mapreduce.application.classpath,
> or for YARN, in yarn.application.classpath.
>
> 2. mapreduce.application.classpath
>
> > And updated the classpath for mapreduce applications:
>
> Note that it points to a distributed cache on the default HDP 2.2
> distribution.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>     </property>
> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
> hadoop-aws.jar(S3NFileSystem)
>
> While on a vanilla hadoop, it looks like standard paths as yours.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>     </property>
>
> Thanks,
> Sato
>
> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Billy,
>>
>>  I think your experience indicates that our documentation is
>> insufficient for discussing how to configure and use the alternative file
>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>
>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>
>>  Please feel free to watch that issue if you'd like to be informed as it
>> makes progress.  Thank you for reporting back to the thread after you had a
>> solution.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Billy Watson <wi...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Monday, April 20, 2015 at 11:14 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>
>>   We found the correct configs.
>>
>>  This post was helpful, but didn't entirely work for us out of the box
>> since we are using hadoop-pseudo-distributed.
>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>
>>  We added a property to the core-site.xml file:
>>
>>    <property>
>>     <name>fs.s3n.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>     <description>Tell hadoop which class to use to access s3 URLs. This
>> change became necessary in hadoop 2.6.0</description>
>>   </property>
>>
>>  And updated the classpath for mapreduce applications:
>>
>>    <property>
>>     <name>mapreduce.application.classpath</name>
>>
>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>     <description>The classpath specifically for mapreduce jobs. This
>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>   </property>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
>> wrote:
>>
>>> Thanks, anyways. Anyone else run into this issue?
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>> cluster with Amazon met
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>> wrote:
>>>>
>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>> environment. We use HDP in staging and production and we discovered these
>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  One thing I think which i most likely missed completely is are you
>>>>> using an amazon EMR cluster or something in house?
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>
>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>
>>>>> Essentially the problem boils down to:
>>>>>
>>>>> - need to access s3n URLs
>>>>> - cannot access without including the tools directory
>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>> happening later in job
>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>
>>>>>
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  you mention an environmental variable. the step before you specify
>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>
>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>
>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>> for scheme s3n" error.
>>>>>>
>>>>>> I feel like at this point I just have to add the
>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>
>>>>>> I appreciate any help, thanks!!
>>>>>>
>>>>>>
>>>>>> Stack trace:
>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>> at
>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>> at
>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>
>>>>>>
>>>>>> — Billy Watson
>>>>>>
>>>>>> --
>>>>>>  William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been
having in scrambling to upgrade our cluster that I forgot to file a bug. I
certainly complained aloud that the docs were insufficient, but I didn't do
anything to help the community so thanks a bunch for recognizing that and
helping me out!

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Billy, Chris,
>
> Let me share a couple of my findings.
>
> I believe this was introduced by HADOOP-10893,
> which was introduced from 2.6.0(HDP2.2).
>
> 1. fs.s3n.impl
>
> > We added a property to the core-site.xml file:
>
> You don't need to explicitly set this. It has never been done so in
> previous versions.
>
> Take a look at FileSystem#loadFileSystem, which is called from
> FileSystem#getFileSystemClass.
> Subclasses of FileSystem are loaded automatically if they are available on
> a classloader you care.
>
> So you just need to make sure hadoop-aws.jar is on a classpath.
>
> For file system shell, this is done in hadoop-env.sh,
> while for a MR job, in mapreduce.application.classpath,
> or for YARN, in yarn.application.classpath.
>
> 2. mapreduce.application.classpath
>
> > And updated the classpath for mapreduce applications:
>
> Note that it points to a distributed cache on the default HDP 2.2
> distribution.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>     </property>
> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
> hadoop-aws.jar(S3NFileSystem)
>
> While on a vanilla hadoop, it looks like standard paths as yours.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>     </property>
>
> Thanks,
> Sato
>
> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Billy,
>>
>>  I think your experience indicates that our documentation is
>> insufficient for discussing how to configure and use the alternative file
>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>
>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>
>>  Please feel free to watch that issue if you'd like to be informed as it
>> makes progress.  Thank you for reporting back to the thread after you had a
>> solution.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Billy Watson <wi...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Monday, April 20, 2015 at 11:14 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>
>>   We found the correct configs.
>>
>>  This post was helpful, but didn't entirely work for us out of the box
>> since we are using hadoop-pseudo-distributed.
>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>
>>  We added a property to the core-site.xml file:
>>
>>    <property>
>>     <name>fs.s3n.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>     <description>Tell hadoop which class to use to access s3 URLs. This
>> change became necessary in hadoop 2.6.0</description>
>>   </property>
>>
>>  And updated the classpath for mapreduce applications:
>>
>>    <property>
>>     <name>mapreduce.application.classpath</name>
>>
>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>     <description>The classpath specifically for mapreduce jobs. This
>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>   </property>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
>> wrote:
>>
>>> Thanks, anyways. Anyone else run into this issue?
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>> cluster with Amazon met
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>> wrote:
>>>>
>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>> environment. We use HDP in staging and production and we discovered these
>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  One thing I think which i most likely missed completely is are you
>>>>> using an amazon EMR cluster or something in house?
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>
>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>
>>>>> Essentially the problem boils down to:
>>>>>
>>>>> - need to access s3n URLs
>>>>> - cannot access without including the tools directory
>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>> happening later in job
>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>
>>>>>
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  you mention an environmental variable. the step before you specify
>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>
>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>
>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>> for scheme s3n" error.
>>>>>>
>>>>>> I feel like at this point I just have to add the
>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>
>>>>>> I appreciate any help, thanks!!
>>>>>>
>>>>>>
>>>>>> Stack trace:
>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>> at
>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>> at
>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>
>>>>>>
>>>>>> — Billy Watson
>>>>>>
>>>>>> --
>>>>>>  William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been
having in scrambling to upgrade our cluster that I forgot to file a bug. I
certainly complained aloud that the docs were insufficient, but I didn't do
anything to help the community so thanks a bunch for recognizing that and
helping me out!

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Billy, Chris,
>
> Let me share a couple of my findings.
>
> I believe this was introduced by HADOOP-10893,
> which was introduced from 2.6.0(HDP2.2).
>
> 1. fs.s3n.impl
>
> > We added a property to the core-site.xml file:
>
> You don't need to explicitly set this. It has never been done so in
> previous versions.
>
> Take a look at FileSystem#loadFileSystem, which is called from
> FileSystem#getFileSystemClass.
> Subclasses of FileSystem are loaded automatically if they are available on
> a classloader you care.
>
> So you just need to make sure hadoop-aws.jar is on a classpath.
>
> For file system shell, this is done in hadoop-env.sh,
> while for a MR job, in mapreduce.application.classpath,
> or for YARN, in yarn.application.classpath.
>
> 2. mapreduce.application.classpath
>
> > And updated the classpath for mapreduce applications:
>
> Note that it points to a distributed cache on the default HDP 2.2
> distribution.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>     </property>
> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
> hadoop-aws.jar(S3NFileSystem)
>
> While on a vanilla hadoop, it looks like standard paths as yours.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>     </property>
>
> Thanks,
> Sato
>
> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Billy,
>>
>>  I think your experience indicates that our documentation is
>> insufficient for discussing how to configure and use the alternative file
>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>
>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>
>>  Please feel free to watch that issue if you'd like to be informed as it
>> makes progress.  Thank you for reporting back to the thread after you had a
>> solution.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Billy Watson <wi...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Monday, April 20, 2015 at 11:14 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>
>>   We found the correct configs.
>>
>>  This post was helpful, but didn't entirely work for us out of the box
>> since we are using hadoop-pseudo-distributed.
>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>
>>  We added a property to the core-site.xml file:
>>
>>    <property>
>>     <name>fs.s3n.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>     <description>Tell hadoop which class to use to access s3 URLs. This
>> change became necessary in hadoop 2.6.0</description>
>>   </property>
>>
>>  And updated the classpath for mapreduce applications:
>>
>>    <property>
>>     <name>mapreduce.application.classpath</name>
>>
>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>     <description>The classpath specifically for mapreduce jobs. This
>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>   </property>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
>> wrote:
>>
>>> Thanks, anyways. Anyone else run into this issue?
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>> cluster with Amazon met
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>> wrote:
>>>>
>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>> environment. We use HDP in staging and production and we discovered these
>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  One thing I think which i most likely missed completely is are you
>>>>> using an amazon EMR cluster or something in house?
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>
>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>
>>>>> Essentially the problem boils down to:
>>>>>
>>>>> - need to access s3n URLs
>>>>> - cannot access without including the tools directory
>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>> happening later in job
>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>
>>>>>
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  you mention an environmental variable. the step before you specify
>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>
>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>
>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>> for scheme s3n" error.
>>>>>>
>>>>>> I feel like at this point I just have to add the
>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>
>>>>>> I appreciate any help, thanks!!
>>>>>>
>>>>>>
>>>>>> Stack trace:
>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>> at
>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>> at
>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>
>>>>>>
>>>>>> — Billy Watson
>>>>>>
>>>>>> --
>>>>>>  William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been
having in scrambling to upgrade our cluster that I forgot to file a bug. I
certainly complained aloud that the docs were insufficient, but I didn't do
anything to help the community so thanks a bunch for recognizing that and
helping me out!

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Billy, Chris,
>
> Let me share a couple of my findings.
>
> I believe this was introduced by HADOOP-10893,
> which was introduced from 2.6.0(HDP2.2).
>
> 1. fs.s3n.impl
>
> > We added a property to the core-site.xml file:
>
> You don't need to explicitly set this. It has never been done so in
> previous versions.
>
> Take a look at FileSystem#loadFileSystem, which is called from
> FileSystem#getFileSystemClass.
> Subclasses of FileSystem are loaded automatically if they are available on
> a classloader you care.
>
> So you just need to make sure hadoop-aws.jar is on a classpath.
>
> For file system shell, this is done in hadoop-env.sh,
> while for a MR job, in mapreduce.application.classpath,
> or for YARN, in yarn.application.classpath.
>
> 2. mapreduce.application.classpath
>
> > And updated the classpath for mapreduce applications:
>
> Note that it points to a distributed cache on the default HDP 2.2
> distribution.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>     </property>
> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
> hadoop-aws.jar(S3NFileSystem)
>
> While on a vanilla hadoop, it looks like standard paths as yours.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>     </property>
>
> Thanks,
> Sato
>
> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>>  Hello Billy,
>>
>>  I think your experience indicates that our documentation is
>> insufficient for discussing how to configure and use the alternative file
>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>
>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>
>>  Please feel free to watch that issue if you'd like to be informed as it
>> makes progress.  Thank you for reporting back to the thread after you had a
>> solution.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Billy Watson <wi...@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Date: Monday, April 20, 2015 at 11:14 AM
>> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>
>>   We found the correct configs.
>>
>>  This post was helpful, but didn't entirely work for us out of the box
>> since we are using hadoop-pseudo-distributed.
>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>
>>  We added a property to the core-site.xml file:
>>
>>    <property>
>>     <name>fs.s3n.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>     <description>Tell hadoop which class to use to access s3 URLs. This
>> change became necessary in hadoop 2.6.0</description>
>>   </property>
>>
>>  And updated the classpath for mapreduce applications:
>>
>>    <property>
>>     <name>mapreduce.application.classpath</name>
>>
>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>     <description>The classpath specifically for mapreduce jobs. This
>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>   </property>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
>> wrote:
>>
>>> Thanks, anyways. Anyone else run into this issue?
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>> cluster with Amazon met
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>
>>>> wrote:
>>>>
>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>> environment. We use HDP in staging and production and we discovered these
>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  One thing I think which i most likely missed completely is are you
>>>>> using an amazon EMR cluster or something in house?
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>
>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>
>>>>> Essentially the problem boils down to:
>>>>>
>>>>> - need to access s3n URLs
>>>>> - cannot access without including the tools directory
>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>> happening later in job
>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>
>>>>>
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  you mention an environmental variable. the step before you specify
>>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>> command line without issue. I have set some options in hadoop-env.sh to
>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was
>>>>>> very confusing, BTW and not enough searchable documentation on changes to
>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>
>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>
>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>>> for scheme s3n" error.
>>>>>>
>>>>>> I feel like at this point I just have to add the
>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>>
>>>>>> I appreciate any help, thanks!!
>>>>>>
>>>>>>
>>>>>> Stack trace:
>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>> at
>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>> at
>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>
>>>>>>
>>>>>> — Billy Watson
>>>>>>
>>>>>> --
>>>>>>  William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in
previous versions.

Take a look at FileSystem#loadFileSystem, which is called from
FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on
a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2
distribution.

    <property>
        <name>mapreduce.application.classpath</name>

<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>

<value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Billy,
>
>  I think your experience indicates that our documentation is insufficient
> for discussing how to configure and use the alternative file systems.  I
> filed issue HADOOP-11863 to track a documentation enhancement.
>
>  https://issues.apache.org/jira/browse/HADOOP-11863
>
>  Please feel free to watch that issue if you'd like to be informed as it
> makes progress.  Thank you for reporting back to the thread after you had a
> solution.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Billy Watson <wi...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 11:14 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>
>   We found the correct configs.
>
>  This post was helpful, but didn't entirely work for us out of the box
> since we are using hadoop-pseudo-distributed.
> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>
>  We added a property to the core-site.xml file:
>
>    <property>
>     <name>fs.s3n.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>     <description>Tell hadoop which class to use to access s3 URLs. This
> change became necessary in hadoop 2.6.0</description>
>   </property>
>
>  And updated the classpath for mapreduce applications:
>
>    <property>
>     <name>mapreduce.application.classpath</name>
>
> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>     <description>The classpath specifically for mapreduce jobs. This
> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>   </property>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
> wrote:
>
>> Thanks, anyways. Anyone else run into this issue?
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>> cluster with Amazon met
>>>
>>> Sent from my iPhone
>>>
>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>>
>>>   This is an install on a CentOS 6 virtual machine used in our test
>>> environment. We use HDP in staging and production and we discovered these
>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>> from Hadoop 2.4 to Hadoop 2.6.
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  One thing I think which i most likely missed completely is are you
>>>> using an amazon EMR cluster or something in house?
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>
>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>
>>>> Essentially the problem boils down to:
>>>>
>>>> - need to access s3n URLs
>>>> - cannot access without including the tools directory
>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>> happening later in job
>>>> - need to find right env variable (or shell script or w/e) to include
>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>
>>>>
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  you mention an environmental variable. the step before you specify
>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>> the entire cluster was a super small and simple bash script.
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>>> stuff in hadoop 2.6 IMHO).
>>>>>
>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>
>>>>> I have added [hadoop-install-loc]/lib and
>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>> for scheme s3n" error.
>>>>>
>>>>> I feel like at this point I just have to add the
>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>
>>>>> I appreciate any help, thanks!!
>>>>>
>>>>>
>>>>> Stack trace:
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>> at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>
>>>>>
>>>>> — Billy Watson
>>>>>
>>>>> --
>>>>>  William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in
previous versions.

Take a look at FileSystem#loadFileSystem, which is called from
FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on
a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2
distribution.

    <property>
        <name>mapreduce.application.classpath</name>

<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>

<value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Billy,
>
>  I think your experience indicates that our documentation is insufficient
> for discussing how to configure and use the alternative file systems.  I
> filed issue HADOOP-11863 to track a documentation enhancement.
>
>  https://issues.apache.org/jira/browse/HADOOP-11863
>
>  Please feel free to watch that issue if you'd like to be informed as it
> makes progress.  Thank you for reporting back to the thread after you had a
> solution.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Billy Watson <wi...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 11:14 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>
>   We found the correct configs.
>
>  This post was helpful, but didn't entirely work for us out of the box
> since we are using hadoop-pseudo-distributed.
> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>
>  We added a property to the core-site.xml file:
>
>    <property>
>     <name>fs.s3n.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>     <description>Tell hadoop which class to use to access s3 URLs. This
> change became necessary in hadoop 2.6.0</description>
>   </property>
>
>  And updated the classpath for mapreduce applications:
>
>    <property>
>     <name>mapreduce.application.classpath</name>
>
> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>     <description>The classpath specifically for mapreduce jobs. This
> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>   </property>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
> wrote:
>
>> Thanks, anyways. Anyone else run into this issue?
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>> cluster with Amazon met
>>>
>>> Sent from my iPhone
>>>
>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>>
>>>   This is an install on a CentOS 6 virtual machine used in our test
>>> environment. We use HDP in staging and production and we discovered these
>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>> from Hadoop 2.4 to Hadoop 2.6.
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  One thing I think which i most likely missed completely is are you
>>>> using an amazon EMR cluster or something in house?
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>
>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>
>>>> Essentially the problem boils down to:
>>>>
>>>> - need to access s3n URLs
>>>> - cannot access without including the tools directory
>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>> happening later in job
>>>> - need to find right env variable (or shell script or w/e) to include
>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>
>>>>
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  you mention an environmental variable. the step before you specify
>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>> the entire cluster was a super small and simple bash script.
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>>> stuff in hadoop 2.6 IMHO).
>>>>>
>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>
>>>>> I have added [hadoop-install-loc]/lib and
>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>> for scheme s3n" error.
>>>>>
>>>>> I feel like at this point I just have to add the
>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>
>>>>> I appreciate any help, thanks!!
>>>>>
>>>>>
>>>>> Stack trace:
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>> at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>
>>>>>
>>>>> — Billy Watson
>>>>>
>>>>> --
>>>>>  William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in
previous versions.

Take a look at FileSystem#loadFileSystem, which is called from
FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on
a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2
distribution.

    <property>
        <name>mapreduce.application.classpath</name>

<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>

<value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Billy,
>
>  I think your experience indicates that our documentation is insufficient
> for discussing how to configure and use the alternative file systems.  I
> filed issue HADOOP-11863 to track a documentation enhancement.
>
>  https://issues.apache.org/jira/browse/HADOOP-11863
>
>  Please feel free to watch that issue if you'd like to be informed as it
> makes progress.  Thank you for reporting back to the thread after you had a
> solution.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Billy Watson <wi...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 11:14 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>
>   We found the correct configs.
>
>  This post was helpful, but didn't entirely work for us out of the box
> since we are using hadoop-pseudo-distributed.
> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>
>  We added a property to the core-site.xml file:
>
>    <property>
>     <name>fs.s3n.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>     <description>Tell hadoop which class to use to access s3 URLs. This
> change became necessary in hadoop 2.6.0</description>
>   </property>
>
>  And updated the classpath for mapreduce applications:
>
>    <property>
>     <name>mapreduce.application.classpath</name>
>
> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>     <description>The classpath specifically for mapreduce jobs. This
> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>   </property>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
> wrote:
>
>> Thanks, anyways. Anyone else run into this issue?
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>> cluster with Amazon met
>>>
>>> Sent from my iPhone
>>>
>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>>
>>>   This is an install on a CentOS 6 virtual machine used in our test
>>> environment. We use HDP in staging and production and we discovered these
>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>> from Hadoop 2.4 to Hadoop 2.6.
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  One thing I think which i most likely missed completely is are you
>>>> using an amazon EMR cluster or something in house?
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>
>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>
>>>> Essentially the problem boils down to:
>>>>
>>>> - need to access s3n URLs
>>>> - cannot access without including the tools directory
>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>> happening later in job
>>>> - need to find right env variable (or shell script or w/e) to include
>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>
>>>>
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  you mention an environmental variable. the step before you specify
>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>> the entire cluster was a super small and simple bash script.
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>>> stuff in hadoop 2.6 IMHO).
>>>>>
>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>
>>>>> I have added [hadoop-install-loc]/lib and
>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>> for scheme s3n" error.
>>>>>
>>>>> I feel like at this point I just have to add the
>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>
>>>>> I appreciate any help, thanks!!
>>>>>
>>>>>
>>>>> Stack trace:
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>> at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>
>>>>>
>>>>> — Billy Watson
>>>>>
>>>>> --
>>>>>  William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in
previous versions.

Take a look at FileSystem#loadFileSystem, which is called from
FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on
a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2
distribution.

    <property>
        <name>mapreduce.application.classpath</name>

<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>

<value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

>  Hello Billy,
>
>  I think your experience indicates that our documentation is insufficient
> for discussing how to configure and use the alternative file systems.  I
> filed issue HADOOP-11863 to track a documentation enhancement.
>
>  https://issues.apache.org/jira/browse/HADOOP-11863
>
>  Please feel free to watch that issue if you'd like to be informed as it
> makes progress.  Thank you for reporting back to the thread after you had a
> solution.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Billy Watson <wi...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 11:14 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>
>   We found the correct configs.
>
>  This post was helpful, but didn't entirely work for us out of the box
> since we are using hadoop-pseudo-distributed.
> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>
>  We added a property to the core-site.xml file:
>
>    <property>
>     <name>fs.s3n.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>     <description>Tell hadoop which class to use to access s3 URLs. This
> change became necessary in hadoop 2.6.0</description>
>   </property>
>
>  And updated the classpath for mapreduce applications:
>
>    <property>
>     <name>mapreduce.application.classpath</name>
>
> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>     <description>The classpath specifically for mapreduce jobs. This
> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>   </property>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
> wrote:
>
>> Thanks, anyways. Anyone else run into this issue?
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>> cluster with Amazon met
>>>
>>> Sent from my iPhone
>>>
>>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>>
>>>   This is an install on a CentOS 6 virtual machine used in our test
>>> environment. We use HDP in staging and production and we discovered these
>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>> from Hadoop 2.4 to Hadoop 2.6.
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  One thing I think which i most likely missed completely is are you
>>>> using an amazon EMR cluster or something in house?
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>
>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>
>>>> Essentially the problem boils down to:
>>>>
>>>> - need to access s3n URLs
>>>> - cannot access without including the tools directory
>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>> happening later in job
>>>> - need to find right env variable (or shell script or w/e) to include
>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>
>>>>
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  you mention an environmental variable. the step before you specify
>>>>> the steps to run to get to the result. you can specify a bash script that
>>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>>> the entire cluster was a super small and simple bash script.
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>>> stuff in hadoop 2.6 IMHO).
>>>>>
>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>
>>>>> I have added [hadoop-install-loc]/lib and
>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>> for scheme s3n" error.
>>>>>
>>>>> I feel like at this point I just have to add the
>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>> variable, but I can't figure out which environment variable that should be.
>>>>>
>>>>> I appreciate any help, thanks!!
>>>>>
>>>>>
>>>>> Stack trace:
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>> at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>
>>>>>
>>>>> — Billy Watson
>>>>>
>>>>> --
>>>>>  William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>>
>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

https://issues.apache.org/jira/browse/HADOOP-11863

Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank you for reporting back to the thread after you had a solution.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Billy Watson <wi...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>> wrote:
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com>> wrote:

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>> wrote:

you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.



---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:

Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


- Billy Watson

--

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
wrote:

> Thanks, anyways. Anyone else run into this issue?
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
>> with Amazon met
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>
>> This is an install on a CentOS 6 virtual machine used in our test
>> environment. We use HDP in staging and production and we discovered these
>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>> from Hadoop 2.4 to Hadoop 2.6.
>>
>> William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  One thing I think which i most likely missed completely is are you
>>> using an amazon EMR cluster or something in house?
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>  On 2015-04-20 16:21, Billy Watson wrote:
>>>
>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>> loaded by default and now they have to be loaded manually, if needed.
>>>
>>> Essentially the problem boils down to:
>>>
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>> happening later in job
>>> - need to find right env variable (or shell script or w/e) to include
>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>
>>>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  you mention an environmental variable. the step before you specify
>>>> the steps to run to get to the result. you can specify a bash script that
>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>> the entire cluster was a super small and simple bash script.
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>> stuff in hadoop 2.6 IMHO).
>>>>
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>
>>>> I have added [hadoop-install-loc]/lib and
>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>> for scheme s3n" error.
>>>>
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>>> directory (and maybe lib) to the right environment variable, but I can't
>>>> figure out which environment variable that should be.
>>>>
>>>> I appreciate any help, thanks!!
>>>>
>>>>
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>> at
>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>
>>>>
>>>> — Billy Watson
>>>>
>>>> --
>>>>  William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
wrote:

> Thanks, anyways. Anyone else run into this issue?
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
>> with Amazon met
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>
>> This is an install on a CentOS 6 virtual machine used in our test
>> environment. We use HDP in staging and production and we discovered these
>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>> from Hadoop 2.4 to Hadoop 2.6.
>>
>> William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  One thing I think which i most likely missed completely is are you
>>> using an amazon EMR cluster or something in house?
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>  On 2015-04-20 16:21, Billy Watson wrote:
>>>
>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>> loaded by default and now they have to be loaded manually, if needed.
>>>
>>> Essentially the problem boils down to:
>>>
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>> happening later in job
>>> - need to find right env variable (or shell script or w/e) to include
>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>
>>>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  you mention an environmental variable. the step before you specify
>>>> the steps to run to get to the result. you can specify a bash script that
>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>> the entire cluster was a super small and simple bash script.
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>> stuff in hadoop 2.6 IMHO).
>>>>
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>
>>>> I have added [hadoop-install-loc]/lib and
>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>> for scheme s3n" error.
>>>>
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>>> directory (and maybe lib) to the right environment variable, but I can't
>>>> figure out which environment variable that should be.
>>>>
>>>> I appreciate any help, thanks!!
>>>>
>>>>
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>> at
>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>
>>>>
>>>> — Billy Watson
>>>>
>>>> --
>>>>  William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
wrote:

> Thanks, anyways. Anyone else run into this issue?
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
>> with Amazon met
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>
>> This is an install on a CentOS 6 virtual machine used in our test
>> environment. We use HDP in staging and production and we discovered these
>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>> from Hadoop 2.4 to Hadoop 2.6.
>>
>> William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  One thing I think which i most likely missed completely is are you
>>> using an amazon EMR cluster or something in house?
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>  On 2015-04-20 16:21, Billy Watson wrote:
>>>
>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>> loaded by default and now they have to be loaded manually, if needed.
>>>
>>> Essentially the problem boils down to:
>>>
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>> happening later in job
>>> - need to find right env variable (or shell script or w/e) to include
>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>
>>>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  you mention an environmental variable. the step before you specify
>>>> the steps to run to get to the result. you can specify a bash script that
>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>> the entire cluster was a super small and simple bash script.
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>> stuff in hadoop 2.6 IMHO).
>>>>
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>
>>>> I have added [hadoop-install-loc]/lib and
>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>> for scheme s3n" error.
>>>>
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>>> directory (and maybe lib) to the right environment variable, but I can't
>>>> figure out which environment variable that should be.
>>>>
>>>> I appreciate any help, thanks!!
>>>>
>>>>
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>> at
>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>
>>>>
>>>> — Billy Watson
>>>>
>>>> --
>>>>  William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <wi...@gmail.com>
wrote:

> Thanks, anyways. Anyone else run into this issue?
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
>> with Amazon met
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>>
>> This is an install on a CentOS 6 virtual machine used in our test
>> environment. We use HDP in staging and production and we discovered these
>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>> from Hadoop 2.4 to Hadoop 2.6.
>>
>> William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  One thing I think which i most likely missed completely is are you
>>> using an amazon EMR cluster or something in house?
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>  On 2015-04-20 16:21, Billy Watson wrote:
>>>
>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>> loaded by default and now they have to be loaded manually, if needed.
>>>
>>> Essentially the problem boils down to:
>>>
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>> happening later in job
>>> - need to find right env variable (or shell script or w/e) to include
>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>
>>>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  you mention an environmental variable. the step before you specify
>>>> the steps to run to get to the result. you can specify a bash script that
>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>> the entire cluster was a super small and simple bash script.
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>>> stuff in hadoop 2.6 IMHO).
>>>>
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>
>>>> I have added [hadoop-install-loc]/lib and
>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>> for scheme s3n" error.
>>>>
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>>> directory (and maybe lib) to the right environment variable, but I can't
>>>> figure out which environment variable that should be.
>>>>
>>>> I appreciate any help, thanks!!
>>>>
>>>>
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>> at
>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>
>>>>
>>>> — Billy Watson
>>>>
>>>> --
>>>>  William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>
>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
> with Amazon met
>
> Sent from my iPhone
>
> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>
> This is an install on a CentOS 6 virtual machine used in our test
> environment. We use HDP in staging and production and we discovered these
> issues while trying to build a new cluster using HDP 2.2 which upgrades
> from Hadoop 2.4 to Hadoop 2.6.
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  One thing I think which i most likely missed completely is are you
>> using an amazon EMR cluster or something in house?
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>  On 2015-04-20 16:21, Billy Watson wrote:
>>
>> I appreciate the response. These JAR files aren't 3rd party. They're
>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>> loaded by default and now they have to be loaded manually, if needed.
>>
>> Essentially the problem boils down to:
>>
>> - need to access s3n URLs
>> - cannot access without including the tools directory
>> - after including tools directory in HADOOP_CLASSPATH, failures start
>> happening later in job
>> - need to find right env variable (or shell script or w/e) to include
>> jets3t & other JARs needed to access s3n URLs (I think)
>>
>>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  you mention an environmental variable. the step before you specify the
>>> steps to run to get to the result. you can specify a bash script that will
>>> allow you to put any 3rd party jar files, for us we used esri, on the
>>> cluster and propagate them to all nodes in the cluster as well. You can
>>> ping me off list if you need further help. Thing is I havent used pig but
>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>> the entire cluster was a super small and simple bash script.
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>
>>> Hi,
>>>
>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>> stuff in hadoop 2.6 IMHO).
>>>
>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>> not fail in pig, but rather fails in mapreduce with "Error:
>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>
>>> I have added [hadoop-install-loc]/lib and
>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>> for scheme s3n" error.
>>>
>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>> directory (and maybe lib) to the right environment variable, but I can't
>>> figure out which environment variable that should be.
>>>
>>> I appreciate any help, thanks!!
>>>
>>>
>>> Stack trace:
>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>> at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>> at
>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>> java.security.AccessController.doPrivileged(Native Method) at
>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>
>>>
>>> — Billy Watson
>>>
>>> --
>>>  William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
> with Amazon met
>
> Sent from my iPhone
>
> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>
> This is an install on a CentOS 6 virtual machine used in our test
> environment. We use HDP in staging and production and we discovered these
> issues while trying to build a new cluster using HDP 2.2 which upgrades
> from Hadoop 2.4 to Hadoop 2.6.
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  One thing I think which i most likely missed completely is are you
>> using an amazon EMR cluster or something in house?
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>  On 2015-04-20 16:21, Billy Watson wrote:
>>
>> I appreciate the response. These JAR files aren't 3rd party. They're
>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>> loaded by default and now they have to be loaded manually, if needed.
>>
>> Essentially the problem boils down to:
>>
>> - need to access s3n URLs
>> - cannot access without including the tools directory
>> - after including tools directory in HADOOP_CLASSPATH, failures start
>> happening later in job
>> - need to find right env variable (or shell script or w/e) to include
>> jets3t & other JARs needed to access s3n URLs (I think)
>>
>>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  you mention an environmental variable. the step before you specify the
>>> steps to run to get to the result. you can specify a bash script that will
>>> allow you to put any 3rd party jar files, for us we used esri, on the
>>> cluster and propagate them to all nodes in the cluster as well. You can
>>> ping me off list if you need further help. Thing is I havent used pig but
>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>> the entire cluster was a super small and simple bash script.
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>
>>> Hi,
>>>
>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>> stuff in hadoop 2.6 IMHO).
>>>
>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>> not fail in pig, but rather fails in mapreduce with "Error:
>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>
>>> I have added [hadoop-install-loc]/lib and
>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>> for scheme s3n" error.
>>>
>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>> directory (and maybe lib) to the right environment variable, but I can't
>>> figure out which environment variable that should be.
>>>
>>> I appreciate any help, thanks!!
>>>
>>>
>>> Stack trace:
>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>> at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>> at
>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>> java.security.AccessController.doPrivileged(Native Method) at
>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>
>>>
>>> — Billy Watson
>>>
>>> --
>>>  William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
> with Amazon met
>
> Sent from my iPhone
>
> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>
> This is an install on a CentOS 6 virtual machine used in our test
> environment. We use HDP in staging and production and we discovered these
> issues while trying to build a new cluster using HDP 2.2 which upgrades
> from Hadoop 2.4 to Hadoop 2.6.
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  One thing I think which i most likely missed completely is are you
>> using an amazon EMR cluster or something in house?
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>  On 2015-04-20 16:21, Billy Watson wrote:
>>
>> I appreciate the response. These JAR files aren't 3rd party. They're
>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>> loaded by default and now they have to be loaded manually, if needed.
>>
>> Essentially the problem boils down to:
>>
>> - need to access s3n URLs
>> - cannot access without including the tools directory
>> - after including tools directory in HADOOP_CLASSPATH, failures start
>> happening later in job
>> - need to find right env variable (or shell script or w/e) to include
>> jets3t & other JARs needed to access s3n URLs (I think)
>>
>>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  you mention an environmental variable. the step before you specify the
>>> steps to run to get to the result. you can specify a bash script that will
>>> allow you to put any 3rd party jar files, for us we used esri, on the
>>> cluster and propagate them to all nodes in the cluster as well. You can
>>> ping me off list if you need further help. Thing is I havent used pig but
>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>> the entire cluster was a super small and simple bash script.
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>
>>> Hi,
>>>
>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>> stuff in hadoop 2.6 IMHO).
>>>
>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>> not fail in pig, but rather fails in mapreduce with "Error:
>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>
>>> I have added [hadoop-install-loc]/lib and
>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>> for scheme s3n" error.
>>>
>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>> directory (and maybe lib) to the right environment variable, but I can't
>>> figure out which environment variable that should be.
>>>
>>> I appreciate any help, thanks!!
>>>
>>>
>>> Stack trace:
>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>> at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>> at
>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>> java.security.AccessController.doPrivileged(Native Method) at
>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>
>>>
>>> — Billy Watson
>>>
>>> --
>>>  William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
> with Amazon met
>
> Sent from my iPhone
>
> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
>
> This is an install on a CentOS 6 virtual machine used in our test
> environment. We use HDP in staging and production and we discovered these
> issues while trying to build a new cluster using HDP 2.2 which upgrades
> from Hadoop 2.4 to Hadoop 2.6.
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  One thing I think which i most likely missed completely is are you
>> using an amazon EMR cluster or something in house?
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>  On 2015-04-20 16:21, Billy Watson wrote:
>>
>> I appreciate the response. These JAR files aren't 3rd party. They're
>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>> loaded by default and now they have to be loaded manually, if needed.
>>
>> Essentially the problem boils down to:
>>
>> - need to access s3n URLs
>> - cannot access without including the tools directory
>> - after including tools directory in HADOOP_CLASSPATH, failures start
>> happening later in job
>> - need to find right env variable (or shell script or w/e) to include
>> jets3t & other JARs needed to access s3n URLs (I think)
>>
>>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  you mention an environmental variable. the step before you specify the
>>> steps to run to get to the result. you can specify a bash script that will
>>> allow you to put any 3rd party jar files, for us we used esri, on the
>>> cluster and propagate them to all nodes in the cluster as well. You can
>>> ping me off list if you need further help. Thing is I havent used pig but
>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>> the entire cluster was a super small and simple bash script.
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>
>>> Hi,
>>>
>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>> confusing, BTW and not enough searchable documentation on changes to the s3
>>> stuff in hadoop 2.6 IMHO).
>>>
>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>> not fail in pig, but rather fails in mapreduce with "Error:
>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>
>>> I have added [hadoop-install-loc]/lib and
>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>> for scheme s3n" error.
>>>
>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>> directory (and maybe lib) to the right environment variable, but I can't
>>> figure out which environment variable that should be.
>>>
>>> I appreciate any help, thanks!!
>>>
>>>
>>> Stack trace:
>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>> at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>> at
>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>> java.security.AccessController.doPrivileged(Native Method) at
>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>
>>>
>>> — Billy Watson
>>>
>>> --
>>>  William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
> 
> This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6. 
> 
> William Watson
> Software Engineer
> (904) 705-7056 PCS
> 
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>> One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?
>> 
>>  
>> 
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>> On 2015-04-20 16:21, Billy Watson wrote:
>>> 
>>> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
>>>  
>>> Essentially the problem boils down to:
>>>  
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
>>> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)
>>>  
>>>  
>>> 
>>> William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>> 
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>>>> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.
>>>> 
>>>>  
>>>> 
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>> On 2015-04-20 15:17, Billy Watson wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
>>>> 
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
>>>> 
>>>> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
>>>> 
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
>>>> 
>>>> I appreciate any help, thanks!!
>>>> 
>>>> 
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> 
>>>> 
>>>> — Billy Watson
>>>> 
>>>> --
>>>> 
>>>> William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
> 
> This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6. 
> 
> William Watson
> Software Engineer
> (904) 705-7056 PCS
> 
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>> One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?
>> 
>>  
>> 
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>> On 2015-04-20 16:21, Billy Watson wrote:
>>> 
>>> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
>>>  
>>> Essentially the problem boils down to:
>>>  
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
>>> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)
>>>  
>>>  
>>> 
>>> William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>> 
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>>>> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.
>>>> 
>>>>  
>>>> 
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>> On 2015-04-20 15:17, Billy Watson wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
>>>> 
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
>>>> 
>>>> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
>>>> 
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
>>>> 
>>>> I appreciate any help, thanks!!
>>>> 
>>>> 
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> 
>>>> 
>>>> — Billy Watson
>>>> 
>>>> --
>>>> 
>>>> William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
> 
> This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6. 
> 
> William Watson
> Software Engineer
> (904) 705-7056 PCS
> 
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>> One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?
>> 
>>  
>> 
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>> On 2015-04-20 16:21, Billy Watson wrote:
>>> 
>>> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
>>>  
>>> Essentially the problem boils down to:
>>>  
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
>>> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)
>>>  
>>>  
>>> 
>>> William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>> 
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>>>> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.
>>>> 
>>>>  
>>>> 
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>> On 2015-04-20 15:17, Billy Watson wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
>>>> 
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
>>>> 
>>>> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
>>>> 
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
>>>> 
>>>> I appreciate any help, thanks!!
>>>> 
>>>> 
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> 
>>>> 
>>>> — Billy Watson
>>>> 
>>>> --
>>>> 
>>>> William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

> On 20 Apr 2015, at 16:53, Billy Watson <wi...@gmail.com> wrote:
> 
> This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP in staging and production and we discovered these issues while trying to build a new cluster using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6. 
> 
> William Watson
> Software Engineer
> (904) 705-7056 PCS
> 
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>> One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?
>> 
>>  
>> 
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>> On 2015-04-20 16:21, Billy Watson wrote:
>>> 
>>> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
>>>  
>>> Essentially the problem boils down to:
>>>  
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
>>> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think)
>>>  
>>>  
>>> 
>>> William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>> 
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
>>>> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script.
>>>> 
>>>>  
>>>> 
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>> On 2015-04-20 15:17, Billy Watson wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
>>>> 
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
>>>> 
>>>> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
>>>> 
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
>>>> 
>>>> I appreciate any help, thanks!!
>>>> 
>>>> 
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> 
>>>> 
>>>> — Billy Watson
>>>> 
>>>> --
>>>> 
>>>> William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

This is an install on a CentOS 6 virtual machine used in our test
environment. We use HDP in staging and production and we discovered these
issues while trying to build a new cluster using HDP 2.2 which upgrades
from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

>  One thing I think which i most likely missed completely is are you using
> an amazon EMR cluster or something in house?
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 16:21, Billy Watson wrote:
>
> I appreciate the response. These JAR files aren't 3rd party. They're
> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
> loaded by default and now they have to be loaded manually, if needed.
>
> Essentially the problem boils down to:
>
> - need to access s3n URLs
> - cannot access without including the tools directory
> - after including tools directory in HADOOP_CLASSPATH, failures start
> happening later in job
> - need to find right env variable (or shell script or w/e) to include
> jets3t & other JARs needed to access s3n URLs (I think)
>
>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  you mention an environmental variable. the step before you specify the
>> steps to run to get to the result. you can specify a bash script that will
>> allow you to put any 3rd party jar files, for us we used esri, on the
>> cluster and propagate them to all nodes in the cluster as well. You can
>> ping me off list if you need further help. Thing is I havent used pig but
>> my boss and coworker wrote the mappers and reducers. to get these jars to
>> the entire cluster was a super small and simple bash script.
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>   On 2015-04-20 15:17, Billy Watson wrote:
>>
>> Hi,
>>
>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>> line without issue. I have set some options in hadoop-env.sh to make sure
>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>> confusing, BTW and not enough searchable documentation on changes to the s3
>> stuff in hadoop 2.6 IMHO).
>>
>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
>> fail in pig, but rather fails in mapreduce with "Error:
>> java.io.IOException: No FileSystem for scheme: s3n."
>>
>> I have added [hadoop-install-loc]/lib and
>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>> for scheme s3n" error.
>>
>> I feel like at this point I just have to add the share/hadoop/tools/lib
>> directory (and maybe lib) to the right environment variable, but I can't
>> figure out which environment variable that should be.
>>
>> I appreciate any help, thanks!!
>>
>>
>> Stack trace:
>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>> at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>> at
>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>> — Billy Watson
>>
>> --
>>  William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

This is an install on a CentOS 6 virtual machine used in our test
environment. We use HDP in staging and production and we discovered these
issues while trying to build a new cluster using HDP 2.2 which upgrades
from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

>  One thing I think which i most likely missed completely is are you using
> an amazon EMR cluster or something in house?
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 16:21, Billy Watson wrote:
>
> I appreciate the response. These JAR files aren't 3rd party. They're
> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
> loaded by default and now they have to be loaded manually, if needed.
>
> Essentially the problem boils down to:
>
> - need to access s3n URLs
> - cannot access without including the tools directory
> - after including tools directory in HADOOP_CLASSPATH, failures start
> happening later in job
> - need to find right env variable (or shell script or w/e) to include
> jets3t & other JARs needed to access s3n URLs (I think)
>
>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  you mention an environmental variable. the step before you specify the
>> steps to run to get to the result. you can specify a bash script that will
>> allow you to put any 3rd party jar files, for us we used esri, on the
>> cluster and propagate them to all nodes in the cluster as well. You can
>> ping me off list if you need further help. Thing is I havent used pig but
>> my boss and coworker wrote the mappers and reducers. to get these jars to
>> the entire cluster was a super small and simple bash script.
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>   On 2015-04-20 15:17, Billy Watson wrote:
>>
>> Hi,
>>
>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>> line without issue. I have set some options in hadoop-env.sh to make sure
>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>> confusing, BTW and not enough searchable documentation on changes to the s3
>> stuff in hadoop 2.6 IMHO).
>>
>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
>> fail in pig, but rather fails in mapreduce with "Error:
>> java.io.IOException: No FileSystem for scheme: s3n."
>>
>> I have added [hadoop-install-loc]/lib and
>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>> for scheme s3n" error.
>>
>> I feel like at this point I just have to add the share/hadoop/tools/lib
>> directory (and maybe lib) to the right environment variable, but I can't
>> figure out which environment variable that should be.
>>
>> I appreciate any help, thanks!!
>>
>>
>> Stack trace:
>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>> at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>> at
>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>> — Billy Watson
>>
>> --
>>  William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

This is an install on a CentOS 6 virtual machine used in our test
environment. We use HDP in staging and production and we discovered these
issues while trying to build a new cluster using HDP 2.2 which upgrades
from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

>  One thing I think which i most likely missed completely is are you using
> an amazon EMR cluster or something in house?
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 16:21, Billy Watson wrote:
>
> I appreciate the response. These JAR files aren't 3rd party. They're
> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
> loaded by default and now they have to be loaded manually, if needed.
>
> Essentially the problem boils down to:
>
> - need to access s3n URLs
> - cannot access without including the tools directory
> - after including tools directory in HADOOP_CLASSPATH, failures start
> happening later in job
> - need to find right env variable (or shell script or w/e) to include
> jets3t & other JARs needed to access s3n URLs (I think)
>
>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  you mention an environmental variable. the step before you specify the
>> steps to run to get to the result. you can specify a bash script that will
>> allow you to put any 3rd party jar files, for us we used esri, on the
>> cluster and propagate them to all nodes in the cluster as well. You can
>> ping me off list if you need further help. Thing is I havent used pig but
>> my boss and coworker wrote the mappers and reducers. to get these jars to
>> the entire cluster was a super small and simple bash script.
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>   On 2015-04-20 15:17, Billy Watson wrote:
>>
>> Hi,
>>
>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>> line without issue. I have set some options in hadoop-env.sh to make sure
>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>> confusing, BTW and not enough searchable documentation on changes to the s3
>> stuff in hadoop 2.6 IMHO).
>>
>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
>> fail in pig, but rather fails in mapreduce with "Error:
>> java.io.IOException: No FileSystem for scheme: s3n."
>>
>> I have added [hadoop-install-loc]/lib and
>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>> for scheme s3n" error.
>>
>> I feel like at this point I just have to add the share/hadoop/tools/lib
>> directory (and maybe lib) to the right environment variable, but I can't
>> figure out which environment variable that should be.
>>
>> I appreciate any help, thanks!!
>>
>>
>> Stack trace:
>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>> at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>> at
>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>> — Billy Watson
>>
>> --
>>  William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

This is an install on a CentOS 6 virtual machine used in our test
environment. We use HDP in staging and production and we discovered these
issues while trying to build a new cluster using HDP 2.2 which upgrades
from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net
> wrote:

>  One thing I think which i most likely missed completely is are you using
> an amazon EMR cluster or something in house?
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 16:21, Billy Watson wrote:
>
> I appreciate the response. These JAR files aren't 3rd party. They're
> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
> loaded by default and now they have to be loaded manually, if needed.
>
> Essentially the problem boils down to:
>
> - need to access s3n URLs
> - cannot access without including the tools directory
> - after including tools directory in HADOOP_CLASSPATH, failures start
> happening later in job
> - need to find right env variable (or shell script or w/e) to include
> jets3t & other JARs needed to access s3n URLs (I think)
>
>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>>  you mention an environmental variable. the step before you specify the
>> steps to run to get to the result. you can specify a bash script that will
>> allow you to put any 3rd party jar files, for us we used esri, on the
>> cluster and propagate them to all nodes in the cluster as well. You can
>> ping me off list if you need further help. Thing is I havent used pig but
>> my boss and coworker wrote the mappers and reducers. to get these jars to
>> the entire cluster was a super small and simple bash script.
>>
>>
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>   On 2015-04-20 15:17, Billy Watson wrote:
>>
>> Hi,
>>
>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>> line without issue. I have set some options in hadoop-env.sh to make sure
>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>> confusing, BTW and not enough searchable documentation on changes to the s3
>> stuff in hadoop 2.6 IMHO).
>>
>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
>> fail in pig, but rather fails in mapreduce with "Error:
>> java.io.IOException: No FileSystem for scheme: s3n."
>>
>> I have added [hadoop-install-loc]/lib and
>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>> for scheme s3n" error.
>>
>> I feel like at this point I just have to add the share/hadoop/tools/lib
>> directory (and maybe lib) to the right environment variable, but I can't
>> figure out which environment variable that should be.
>>
>> I appreciate any help, thanks!!
>>
>>
>> Stack trace:
>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>> at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>> at
>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>> — Billy Watson
>>
>> --
>>  William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

One thing I think which i most likely missed completely is are you using
an amazon EMR cluster or something in house? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote: 

> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
> 
> Essentially the problem boils down to: 
> 
> - need to access s3n URLs 
> - cannot access without including the tools directory 
> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job 
> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think) 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS 
> 
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-04-20 15:17, Billy Watson wrote: 
> 
> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 [1] PCS
 

Links:
------
[1] tel:%28904%29%20705-7056

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

One thing I think which i most likely missed completely is are you using
an amazon EMR cluster or something in house? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote: 

> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
> 
> Essentially the problem boils down to: 
> 
> - need to access s3n URLs 
> - cannot access without including the tools directory 
> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job 
> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think) 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS 
> 
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-04-20 15:17, Billy Watson wrote: 
> 
> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 [1] PCS
 

Links:
------
[1] tel:%28904%29%20705-7056

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

One thing I think which i most likely missed completely is are you using
an amazon EMR cluster or something in house? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote: 

> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
> 
> Essentially the problem boils down to: 
> 
> - need to access s3n URLs 
> - cannot access without including the tools directory 
> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job 
> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think) 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS 
> 
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-04-20 15:17, Billy Watson wrote: 
> 
> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 [1] PCS
 

Links:
------
[1] tel:%28904%29%20705-7056

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

One thing I think which i most likely missed completely is are you using
an amazon EMR cluster or something in house? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote: 

> I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to be loaded manually, if needed. 
> 
> Essentially the problem boils down to: 
> 
> - need to access s3n URLs 
> - cannot access without including the tools directory 
> - after including tools directory in HADOOP_CLASSPATH, failures start happening later in job 
> - need to find right env variable (or shell script or w/e) to include jets3t & other JARs needed to access s3n URLs (I think) 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS 
> 
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> you mention an environmental variable. the step before you specify the steps to run to get to the result. you can specify a bash script that will allow you to put any 3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in the cluster as well. You can ping me off list if you need further help. Thing is I havent used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster was a super small and simple bash script. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-04-20 15:17, Billy Watson wrote: 
> 
> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 [1] PCS
 

Links:
------
[1] tel:%28904%29%20705-7056

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

I appreciate the response. These JAR files aren't 3rd party. They're
included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start
happening later in job
- need to find right env variable (or shell script or w/e) to include
jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

>  you mention an environmental variable. the step before you specify the
> steps to run to get to the result. you can specify a bash script that will
> allow you to put any 3rd party jar files, for us we used esri, on the
> cluster and propagate them to all nodes in the cluster as well. You can
> ping me off list if you need further help. Thing is I havent used pig but
> my boss and coworker wrote the mappers and reducers. to get these jars to
> the entire cluster was a super small and simple bash script.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 15:17, Billy Watson wrote:
>
> Hi,
>
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
> line without issue. I have set some options in hadoop-env.sh to make sure
> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
> confusing, BTW and not enough searchable documentation on changes to the s3
> stuff in hadoop 2.6 IMHO).
>
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
> fail in pig, but rather fails in mapreduce with "Error:
> java.io.IOException: No FileSystem for scheme: s3n."
>
> I have added [hadoop-install-loc]/lib and
> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
> for scheme s3n" error.
>
> I feel like at this point I just have to add the share/hadoop/tools/lib
> directory (and maybe lib) to the right environment variable, but I can't
> figure out which environment variable that should be.
>
> I appreciate any help, thanks!!
>
>
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
> at
> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> — Billy Watson
>
> --
>  William Watson
> Software Engineer
> (904) 705-7056 PCS
>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

I appreciate the response. These JAR files aren't 3rd party. They're
included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start
happening later in job
- need to find right env variable (or shell script or w/e) to include
jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

>  you mention an environmental variable. the step before you specify the
> steps to run to get to the result. you can specify a bash script that will
> allow you to put any 3rd party jar files, for us we used esri, on the
> cluster and propagate them to all nodes in the cluster as well. You can
> ping me off list if you need further help. Thing is I havent used pig but
> my boss and coworker wrote the mappers and reducers. to get these jars to
> the entire cluster was a super small and simple bash script.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 15:17, Billy Watson wrote:
>
> Hi,
>
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
> line without issue. I have set some options in hadoop-env.sh to make sure
> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
> confusing, BTW and not enough searchable documentation on changes to the s3
> stuff in hadoop 2.6 IMHO).
>
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
> fail in pig, but rather fails in mapreduce with "Error:
> java.io.IOException: No FileSystem for scheme: s3n."
>
> I have added [hadoop-install-loc]/lib and
> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
> for scheme s3n" error.
>
> I feel like at this point I just have to add the share/hadoop/tools/lib
> directory (and maybe lib) to the right environment variable, but I can't
> figure out which environment variable that should be.
>
> I appreciate any help, thanks!!
>
>
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
> at
> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> — Billy Watson
>
> --
>  William Watson
> Software Engineer
> (904) 705-7056 PCS
>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

I appreciate the response. These JAR files aren't 3rd party. They're
included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start
happening later in job
- need to find right env variable (or shell script or w/e) to include
jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

>  you mention an environmental variable. the step before you specify the
> steps to run to get to the result. you can specify a bash script that will
> allow you to put any 3rd party jar files, for us we used esri, on the
> cluster and propagate them to all nodes in the cluster as well. You can
> ping me off list if you need further help. Thing is I havent used pig but
> my boss and coworker wrote the mappers and reducers. to get these jars to
> the entire cluster was a super small and simple bash script.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 15:17, Billy Watson wrote:
>
> Hi,
>
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
> line without issue. I have set some options in hadoop-env.sh to make sure
> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
> confusing, BTW and not enough searchable documentation on changes to the s3
> stuff in hadoop 2.6 IMHO).
>
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
> fail in pig, but rather fails in mapreduce with "Error:
> java.io.IOException: No FileSystem for scheme: s3n."
>
> I have added [hadoop-install-loc]/lib and
> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
> for scheme s3n" error.
>
> I feel like at this point I just have to add the share/hadoop/tools/lib
> directory (and maybe lib) to the right environment variable, but I can't
> figure out which environment variable that should be.
>
> I appreciate any help, thanks!!
>
>
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
> at
> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> — Billy Watson
>
> --
>  William Watson
> Software Engineer
> (904) 705-7056 PCS
>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Billy Watson <wi...@gmail.com>.

I appreciate the response. These JAR files aren't 3rd party. They're
included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start
happening later in job
- need to find right env variable (or shell script or w/e) to include
jets3t & other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

>  you mention an environmental variable. the step before you specify the
> steps to run to get to the result. you can specify a bash script that will
> allow you to put any 3rd party jar files, for us we used esri, on the
> cluster and propagate them to all nodes in the cluster as well. You can
> ping me off list if you need further help. Thing is I havent used pig but
> my boss and coworker wrote the mappers and reducers. to get these jars to
> the entire cluster was a super small and simple bash script.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-04-20 15:17, Billy Watson wrote:
>
> Hi,
>
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
> line without issue. I have set some options in hadoop-env.sh to make sure
> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
> confusing, BTW and not enough searchable documentation on changes to the s3
> stuff in hadoop 2.6 IMHO).
>
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
> fail in pig, but rather fails in mapreduce with "Error:
> java.io.IOException: No FileSystem for scheme: s3n."
>
> I have added [hadoop-install-loc]/lib and
> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
> for scheme s3n" error.
>
> I feel like at this point I just have to add the share/hadoop/tools/lib
> directory (and maybe lib) to the right environment variable, but I can't
> figure out which environment variable that should be.
>
> I appreciate any help, thanks!!
>
>
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
> at
> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> — Billy Watson
>
> --
>  William Watson
> Software Engineer
> (904) 705-7056 PCS
>
>

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

you mention an environmental variable. the step before you specify the
steps to run to get to the result. you can specify a bash script that
will allow you to put any 3rd party jar files, for us we used esri, on
the cluster and propagate them to all nodes in the cluster as well. You
can ping me off list if you need further help. Thing is I havent used
pig but my boss and coworker wrote the mappers and reducers. to get
these jars to the entire cluster was a super small and simple bash
script. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote: 

> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

you mention an environmental variable. the step before you specify the
steps to run to get to the result. you can specify a bash script that
will allow you to put any 3rd party jar files, for us we used esri, on
the cluster and propagate them to all nodes in the cluster as well. You
can ping me off list if you need further help. Thing is I havent used
pig but my boss and coworker wrote the mappers and reducers. to get
these jars to the entire cluster was a super small and simple bash
script. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote: 

> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

you mention an environmental variable. the step before you specify the
steps to run to get to the result. you can specify a bash script that
will allow you to put any 3rd party jar files, for us we used esri, on
the cluster and propagate them to all nodes in the cluster as well. You
can ping me off list if you need further help. Thing is I havent used
pig but my boss and coworker wrote the mappers and reducers. to get
these jars to the entire cluster was a super small and simple bash
script. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote: 

> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS

Re: Unable to Find S3N Filesystem Hadoop 2.6

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

you mention an environmental variable. the step before you specify the
steps to run to get to the result. you can specify a bash script that
will allow you to put any 3rd party jar files, for us we used esri, on
the cluster and propagate them to all nodes in the cluster as well. You
can ping me off list if you need further help. Thing is I havent used
pig but my boss and coworker wrote the mappers and reducers. to get
these jars to the entire cluster was a super small and simple bash
script. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote: 

> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS