You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2017/08/22 13:00:26 UTC

hadoop 3 scripts & classpath setup

I'm having problems getting the s3 classpath setup on the CLI & am trying to work out what I'm doing wrong.


without setting things up, you can't expect to talk to blobstores

hadoop fs -ls wasb://something/
hadoop fs -ls s3a://landsat-pds/

That's expected. but what I can't do is get the aws bits on the CP via HADOOP_OPTIONAL_TOOLS

export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws,hadoop-adl,hadoop-openstack"

Once I do that the wasb:// ls works (or at least doesnt throw a CNFE), but the s3a URL still fails

if Add the line to ~/.hadooprc all becomes well

hadoop_add_to_classpath_tools hadoop-aws

any ideas?

Re: hadoop 3 scripts & classpath setup

Posted by Steve Loughran <st...@hortonworks.com>.

> On 25 Aug 2017, at 19:49, Allen Wittenauer <aw...@effectivemachines.com> wrote:
> 
> 
>> On Aug 25, 2017, at 10:00 AM, Steve Loughran <st...@hortonworks.com> wrote:
>> 
>> Catching up on this. Looks like I don't have a hadoop-aws profile, which explains a lot, doesn't it.
> 
> 	Yes. This is exactly the type of failure I'd expect.
> 
>> How do those profiles get created/copied in?
> 
> 	Maven kludgery.
> 
> 	In a hadoop-tools sub-module pom.xml, you'll find an entry like this or similar:
> 
>      <plugin>
>        <groupId>org.apache.maven.plugins</groupId>
>        <artifactId>maven-dependency-plugin</artifactId>
>        <executions>
>          <execution>
>            <id>deplist</id>
>            <phase>compile</phase>
>            <goals>
>              <goal>list</goal>
>            </goals>
>            <configuration>
>              <!-- build a shellprofile -->
>              <outputFile>${project.basedir}/target/hadoop-tools-deps/${project.artifactId}.tools-optional.txt</outputFile>
>            </configuration>
>          </execution>
>        </executions>
>      </plugin>
> 
> 	The files generated by this entry get read by dev-support/bin/dist-tools-hooks-maker.  That script is run as part of -Pdist in hadoop-dist.  The outputFile name determines what kind of support hook it makes.  (There were a lot of bad decisions made in nomenclature here. I take full responsibility for the confusion.  But it makes more sense when one views the names from the perspective of the code in hadoop-functions.sh)
> 
> 	All/most of this hackery should probably get replaced by something smarter in the hadoop-maven-plugin.  But for the most part, this does work though and makes the end user experience significantly better.
> 
>> I know there's an explicit s3guard entry now.
>> 
>> hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
>> 
>> ..do you think the presence of that entry is causing problems (i.e stopping a hadoop-aws profile being created?)
> 
> 	I can confirm that HADOOP-13345 doesn't get a shellprofile.d/hadoop-aws.sh created. That's not good. I don't have time right now to dig deep, but a few things pop into my head:
> 
> * multiple org.apache.maven.plugins definitions in the pom.xml (do all of them get executed or just the last one?)
> * dist-tools-hooks-maker may only allowed one of builtin or optional . may need to define a 3rd type that does a smart version of both
> * -Pdist may only allow one shellprofile.d dir per module ?
> 
> 	If you want, file a jira and assign it to me.  I'll try and dig into it next week.

Let me look at it first; I'd want this dealt with before doing the S3Guard branch merge so there's no regression in trunk

> 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: hadoop 3 scripts & classpath setup

Posted by Allen Wittenauer <aw...@effectivemachines.com>.

> On Aug 25, 2017, at 10:00 AM, Steve Loughran <st...@hortonworks.com> wrote:
> 
> Catching up on this. Looks like I don't have a hadoop-aws profile, which explains a lot, doesn't it.

	Yes. This is exactly the type of failure I'd expect.

> How do those profiles get created/copied in?

	Maven kludgery.

	In a hadoop-tools sub-module pom.xml, you'll find an entry like this or similar:

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-dependency-plugin</artifactId>
        <executions>
          <execution>
            <id>deplist</id>
            <phase>compile</phase>
            <goals>
              <goal>list</goal>
            </goals>
            <configuration>
              <!-- build a shellprofile -->
              <outputFile>${project.basedir}/target/hadoop-tools-deps/${project.artifactId}.tools-optional.txt</outputFile>
            </configuration>
          </execution>
        </executions>
      </plugin>

	The files generated by this entry get read by dev-support/bin/dist-tools-hooks-maker.  That script is run as part of -Pdist in hadoop-dist.  The outputFile name determines what kind of support hook it makes.  (There were a lot of bad decisions made in nomenclature here. I take full responsibility for the confusion.  But it makes more sense when one views the names from the perspective of the code in hadoop-functions.sh)

	All/most of this hackery should probably get replaced by something smarter in the hadoop-maven-plugin.  But for the most part, this does work though and makes the end user experience significantly better.

> I know there's an explicit s3guard entry now.
> 
> hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
> 
> ..do you think the presence of that entry is causing problems (i.e stopping a hadoop-aws profile being created?)

	I can confirm that HADOOP-13345 doesn't get a shellprofile.d/hadoop-aws.sh created. That's not good. I don't have time right now to dig deep, but a few things pop into my head:

* multiple org.apache.maven.plugins definitions in the pom.xml (do all of them get executed or just the last one?)
* dist-tools-hooks-maker may only allowed one of builtin or optional . may need to define a 3rd type that does a smart version of both
* -Pdist may only allow one shellprofile.d dir per module ?

	If you want, file a jira and assign it to me.  I'll try and dig into it next week.
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: hadoop 3 scripts & classpath setup

Posted by Steve Loughran <st...@hortonworks.com>.

On 22 Aug 2017, at 17:24, Allen Wittenauer <aw...@effectivemachines.com>> wrote:

Ugly error, but still no CNFE. So at least out of the box with a build from last week. I guess this is working?  At this point, it’d probably be worthwhile to make sure that the libexec/shellprofile.d/hadoop-aws.sh on your system is in working order. In particular...

=======================
if hadoop_verify_entry HADOOP_TOOLS_OPTIONS "hadoop-aws"; then
 hadoop_add_profile "hadoop-aws”
fi
=======================

… is the magic code.  It (effectively[2]) says that if HADOOP_OPTIONAL_TOOLS has hadoop-aws in it, then activate the hadoop-aws profile which should end up calling hadoop_add_to_classpath_tools hadoop-aws.   Might also be worthwhile to check simple stuff like permissions.

Catching up on this. Looks like I don't have a hadoop-aws profile, which explains a lot, doesn't it.

How do those profiles get created/copied in? I know there's an explicit s3guard entry now.

hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh

..do you think the presence of that entry is causing problems (i.e stopping a hadoop-aws profile being created?)

Re: hadoop 3 scripts & classpath setup

Posted by Allen Wittenauer <aw...@effectivemachines.com>.

> On Aug 22, 2017, at 6:00 AM, Steve Loughran <st...@hortonworks.com> wrote:
> 
> 
> I'm having problems getting the s3 classpath setup on the CLI & am trying to work out what I'm doing wrong.
> 
> 
> without setting things up, you can't expect to talk to blobstores
> 
> hadoop fs -ls wasb://something/
> hadoop fs -ls s3a://landsat-pds/
> 
> That's expected.

	Yup.

> but what I can't do is get the aws bits on the CP via HADOOP_OPTIONAL_TOOLS
> 
> export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws,hadoop-adl,hadoop-openstack"
> 
> Once I do that the wasb:// ls works (or at least doesnt throw a CNFE), but the s3a URL still fails

	Hmm. So HOT is getting processed at least somewhat then...

> if Add the line to ~/.hadooprc all becomes well
> 
> hadoop_add_to_classpath_tools hadoop-aws
> 
> any ideas?

	Setting HOT should be calling the equivalent of hadoop_add_to_classpath_tools hadoop-aws in the code path.  Luckily, we have debugging tools in 3.x[1]:

First, let’s duplicate the failure conditions, but only activate hadoop-aws since it should be standalone and cuts our output down:

=======================
$ cat ~/.hadooprc
cat: /Users/aw/.hadooprc: No such file or directory
$ bin/hadoop envvars | grep CONF
HADOOP_CONF_DIR='/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/etc/hadoop'
$ pwd
/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT
$ grep OPTIONAL_TOOLS etc/hadoop/hadoop-env.sh
# export HADOOP_OPTIONAL_TOOLS="hadoop-aliyun,hadoop-aws,hadoop-azure,hadoop-azure-datalake,hadoop-kafka,hadoop-openstack"
export HADOOP_OPTIONAL_TOOLS="hadoop-aws”
=======================

Using --debug, let’s see what happens:

=======================
$ bin/hadoop --debug classpath 2>&1 | egrep '(tools|hadoop-aws)'
DEBUG: shellprofiles: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aliyun.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-archive-logs.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-archives.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aws.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure-datalake.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-distcp.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-extras.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-gridmix.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-hdfs.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-httpfs.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-kafka.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-kms.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-mapreduce.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-openstack.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-rumen.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-streaming.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-yarn.sh
DEBUG: Adding hadoop-aws to HADOOP_TOOLS_OPTIONS
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aws.sh
DEBUG: HADOOP_SHELL_PROFILES accepted hadoop-aws
DEBUG: Profiles: hadoop-aws classpath
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.134.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/java-xmlbuilder-0.4.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/jets3t-0.9.0.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/hadoop-aws-3.0.0-beta1-SNAPSHOT.jar
/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/etc/hadoop:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/common/lib/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/common/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.134.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/java-xmlbuilder-0.4.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/jets3t-0.9.0.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/hadoop-aws-3.0.0-beta1-SNAPSHOT.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs/lib/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/mapreduce/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/yarn/lib/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/yarn/*
=======================

OK, the “extra” bits are definitely getting added.  With the addition of the debug lines:
* the hadoop-aws profile and tools hooks are getting executed
* the hadoop-aws classpath function is getting executed (aka hadoop_add_to_classpath_tools hadoop-aws)
* the classpath isn’t rejecting any jars
* the final line definitely has AWS there.

So we should be good to go assuming the profile and supplemental tools code is correct.

=======================
$ bin/hadoop fs -ls s3a://landsat-pds/
ls: Interrupted
=======================

umm, ok?  No CNFE though.  If I disable the network:

=======================
$ bin/hadoop fs -ls s3a://landsat-pds/
ls: doesBucketExist on landsat-pds: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
=======================

Ugly error, but still no CNFE. So at least out of the box with a build from last week. I guess this is working?  At this point, it’d probably be worthwhile to make sure that the libexec/shellprofile.d/hadoop-aws.sh on your system is in working order. In particular...

=======================
if hadoop_verify_entry HADOOP_TOOLS_OPTIONS "hadoop-aws"; then
  hadoop_add_profile "hadoop-aws”
fi
=======================

… is the magic code.  It (effectively[2]) says that if HADOOP_OPTIONAL_TOOLS has hadoop-aws in it, then activate the hadoop-aws profile which should end up calling hadoop_add_to_classpath_tools hadoop-aws.   Might also be worthwhile to check simple stuff like permissions.

[1] It’s tempting to say “now”, but given that debug was added several years ago. it’s more like branch-2 is just really ancient rather than 3.x being "current".

[2]  yes, that variable is supposed to be HADOOP_TOOLS_OPTIONS.  HOT gets transformed into HADOOP_OPTIONAL_TOOLS  internally for “reasons”.  It’s a longer discussion that most people aren’t interested in.



---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org