You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2017/08/22 13:00:26 UTC
hadoop 3 scripts & classpath setup
I'm having problems getting the s3 classpath setup on the CLI & am trying to work out what I'm doing wrong.
without setting things up, you can't expect to talk to blobstores
hadoop fs -ls wasb://something/
hadoop fs -ls s3a://landsat-pds/
That's expected. but what I can't do is get the aws bits on the CP via HADOOP_OPTIONAL_TOOLS
export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws,hadoop-adl,hadoop-openstack"
Once I do that the wasb:// ls works (or at least doesnt throw a CNFE), but the s3a URL still fails
if Add the line to ~/.hadooprc all becomes well
hadoop_add_to_classpath_tools hadoop-aws
any ideas?
Re: hadoop 3 scripts & classpath setup
Posted by Steve Loughran <st...@hortonworks.com>.
> On 25 Aug 2017, at 19:49, Allen Wittenauer <aw...@effectivemachines.com> wrote:
>
>
>> On Aug 25, 2017, at 10:00 AM, Steve Loughran <st...@hortonworks.com> wrote:
>>
>> Catching up on this. Looks like I don't have a hadoop-aws profile, which explains a lot, doesn't it.
>
> Yes. This is exactly the type of failure I'd expect.
>
>> How do those profiles get created/copied in?
>
> Maven kludgery.
>
> In a hadoop-tools sub-module pom.xml, you'll find an entry like this or similar:
>
> <plugin>
> <groupId>org.apache.maven.plugins</groupId>
> <artifactId>maven-dependency-plugin</artifactId>
> <executions>
> <execution>
> <id>deplist</id>
> <phase>compile</phase>
> <goals>
> <goal>list</goal>
> </goals>
> <configuration>
> <!-- build a shellprofile -->
> <outputFile>${project.basedir}/target/hadoop-tools-deps/${project.artifactId}.tools-optional.txt</outputFile>
> </configuration>
> </execution>
> </executions>
> </plugin>
>
> The files generated by this entry get read by dev-support/bin/dist-tools-hooks-maker. That script is run as part of -Pdist in hadoop-dist. The outputFile name determines what kind of support hook it makes. (There were a lot of bad decisions made in nomenclature here. I take full responsibility for the confusion. But it makes more sense when one views the names from the perspective of the code in hadoop-functions.sh)
>
> All/most of this hackery should probably get replaced by something smarter in the hadoop-maven-plugin. But for the most part, this does work though and makes the end user experience significantly better.
>
>> I know there's an explicit s3guard entry now.
>>
>> hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
>>
>> ..do you think the presence of that entry is causing problems (i.e stopping a hadoop-aws profile being created?)
>
> I can confirm that HADOOP-13345 doesn't get a shellprofile.d/hadoop-aws.sh created. That's not good. I don't have time right now to dig deep, but a few things pop into my head:
>
> * multiple org.apache.maven.plugins definitions in the pom.xml (do all of them get executed or just the last one?)
> * dist-tools-hooks-maker may only allowed one of builtin or optional . may need to define a 3rd type that does a smart version of both
> * -Pdist may only allow one shellprofile.d dir per module ?
>
> If you want, file a jira and assign it to me. I'll try and dig into it next week.
Let me look at it first; I'd want this dealt with before doing the S3Guard branch merge so there's no regression in trunk
>
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org
Re: hadoop 3 scripts & classpath setup
Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Aug 25, 2017, at 10:00 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
> Catching up on this. Looks like I don't have a hadoop-aws profile, which explains a lot, doesn't it.
Yes. This is exactly the type of failure I'd expect.
> How do those profiles get created/copied in?
Maven kludgery.
In a hadoop-tools sub-module pom.xml, you'll find an entry like this or similar:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>deplist</id>
<phase>compile</phase>
<goals>
<goal>list</goal>
</goals>
<configuration>
<!-- build a shellprofile -->
<outputFile>${project.basedir}/target/hadoop-tools-deps/${project.artifactId}.tools-optional.txt</outputFile>
</configuration>
</execution>
</executions>
</plugin>
The files generated by this entry get read by dev-support/bin/dist-tools-hooks-maker. That script is run as part of -Pdist in hadoop-dist. The outputFile name determines what kind of support hook it makes. (There were a lot of bad decisions made in nomenclature here. I take full responsibility for the confusion. But it makes more sense when one views the names from the perspective of the code in hadoop-functions.sh)
All/most of this hackery should probably get replaced by something smarter in the hadoop-maven-plugin. But for the most part, this does work though and makes the end user experience significantly better.
> I know there's an explicit s3guard entry now.
>
> hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
>
> ..do you think the presence of that entry is causing problems (i.e stopping a hadoop-aws profile being created?)
I can confirm that HADOOP-13345 doesn't get a shellprofile.d/hadoop-aws.sh created. That's not good. I don't have time right now to dig deep, but a few things pop into my head:
* multiple org.apache.maven.plugins definitions in the pom.xml (do all of them get executed or just the last one?)
* dist-tools-hooks-maker may only allowed one of builtin or optional . may need to define a 3rd type that does a smart version of both
* -Pdist may only allow one shellprofile.d dir per module ?
If you want, file a jira and assign it to me. I'll try and dig into it next week.
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org
Re: hadoop 3 scripts & classpath setup
Posted by Steve Loughran <st...@hortonworks.com>.
On 22 Aug 2017, at 17:24, Allen Wittenauer <aw...@effectivemachines.com>> wrote:
Ugly error, but still no CNFE. So at least out of the box with a build from last week. I guess this is working? At this point, it’d probably be worthwhile to make sure that the libexec/shellprofile.d/hadoop-aws.sh on your system is in working order. In particular...
=======================
if hadoop_verify_entry HADOOP_TOOLS_OPTIONS "hadoop-aws"; then
hadoop_add_profile "hadoop-aws”
fi
=======================
… is the magic code. It (effectively[2]) says that if HADOOP_OPTIONAL_TOOLS has hadoop-aws in it, then activate the hadoop-aws profile which should end up calling hadoop_add_to_classpath_tools hadoop-aws. Might also be worthwhile to check simple stuff like permissions.
Catching up on this. Looks like I don't have a hadoop-aws profile, which explains a lot, doesn't it.
How do those profiles get created/copied in? I know there's an explicit s3guard entry now.
hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
..do you think the presence of that entry is causing problems (i.e stopping a hadoop-aws profile being created?)
Re: hadoop 3 scripts & classpath setup
Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Aug 22, 2017, at 6:00 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
> I'm having problems getting the s3 classpath setup on the CLI & am trying to work out what I'm doing wrong.
>
>
> without setting things up, you can't expect to talk to blobstores
>
> hadoop fs -ls wasb://something/
> hadoop fs -ls s3a://landsat-pds/
>
> That's expected.
Yup.
> but what I can't do is get the aws bits on the CP via HADOOP_OPTIONAL_TOOLS
>
> export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-aws,hadoop-adl,hadoop-openstack"
>
> Once I do that the wasb:// ls works (or at least doesnt throw a CNFE), but the s3a URL still fails
Hmm. So HOT is getting processed at least somewhat then...
> if Add the line to ~/.hadooprc all becomes well
>
> hadoop_add_to_classpath_tools hadoop-aws
>
> any ideas?
Setting HOT should be calling the equivalent of hadoop_add_to_classpath_tools hadoop-aws in the code path. Luckily, we have debugging tools in 3.x[1]:
First, let’s duplicate the failure conditions, but only activate hadoop-aws since it should be standalone and cuts our output down:
=======================
$ cat ~/.hadooprc
cat: /Users/aw/.hadooprc: No such file or directory
$ bin/hadoop envvars | grep CONF
HADOOP_CONF_DIR='/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/etc/hadoop'
$ pwd
/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT
$ grep OPTIONAL_TOOLS etc/hadoop/hadoop-env.sh
# export HADOOP_OPTIONAL_TOOLS="hadoop-aliyun,hadoop-aws,hadoop-azure,hadoop-azure-datalake,hadoop-kafka,hadoop-openstack"
export HADOOP_OPTIONAL_TOOLS="hadoop-aws”
=======================
Using --debug, let’s see what happens:
=======================
$ bin/hadoop --debug classpath 2>&1 | egrep '(tools|hadoop-aws)'
DEBUG: shellprofiles: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aliyun.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-archive-logs.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-archives.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aws.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure-datalake.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-azure.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-distcp.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-extras.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-gridmix.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-hdfs.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-httpfs.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-kafka.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-kms.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-mapreduce.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-openstack.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-rumen.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-streaming.sh /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-yarn.sh
DEBUG: Adding hadoop-aws to HADOOP_TOOLS_OPTIONS
DEBUG: Profiles: importing /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/bin/../libexec/shellprofile.d/hadoop-aws.sh
DEBUG: HADOOP_SHELL_PROFILES accepted hadoop-aws
DEBUG: Profiles: hadoop-aws classpath
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.134.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/java-xmlbuilder-0.4.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/jets3t-0.9.0.jar
DEBUG: Append CLASSPATH: /Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/hadoop-aws-3.0.0-beta1-SNAPSHOT.jar
/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/etc/hadoop:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/common/lib/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/common/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.134.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/java-xmlbuilder-0.4.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/jets3t-0.9.0.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/tools/lib/hadoop-aws-3.0.0-beta1-SNAPSHOT.jar:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs/lib/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/mapreduce/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/yarn/lib/*:/Users/aw/H/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/yarn/*
=======================
OK, the “extra” bits are definitely getting added. With the addition of the debug lines:
* the hadoop-aws profile and tools hooks are getting executed
* the hadoop-aws classpath function is getting executed (aka hadoop_add_to_classpath_tools hadoop-aws)
* the classpath isn’t rejecting any jars
* the final line definitely has AWS there.
So we should be good to go assuming the profile and supplemental tools code is correct.
=======================
$ bin/hadoop fs -ls s3a://landsat-pds/
ls: Interrupted
=======================
umm, ok? No CNFE though. If I disable the network:
=======================
$ bin/hadoop fs -ls s3a://landsat-pds/
ls: doesBucketExist on landsat-pds: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
=======================
Ugly error, but still no CNFE. So at least out of the box with a build from last week. I guess this is working? At this point, it’d probably be worthwhile to make sure that the libexec/shellprofile.d/hadoop-aws.sh on your system is in working order. In particular...
=======================
if hadoop_verify_entry HADOOP_TOOLS_OPTIONS "hadoop-aws"; then
hadoop_add_profile "hadoop-aws”
fi
=======================
… is the magic code. It (effectively[2]) says that if HADOOP_OPTIONAL_TOOLS has hadoop-aws in it, then activate the hadoop-aws profile which should end up calling hadoop_add_to_classpath_tools hadoop-aws. Might also be worthwhile to check simple stuff like permissions.
[1] It’s tempting to say “now”, but given that debug was added several years ago. it’s more like branch-2 is just really ancient rather than 3.x being "current".
[2] yes, that variable is supposed to be HADOOP_TOOLS_OPTIONS. HOT gets transformed into HADOOP_OPTIONAL_TOOLS internally for “reasons”. It’s a longer discussion that most people aren’t interested in.
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org