You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Laszlo Gaal (Code Review)" <ge...@cloudera.org> on 2017/10/17 19:39:39 UTC

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Laszlo Gaal has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8294


Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

JENKINS-1102 added the IAM role ImpalaDev to the Impala Jenkins workers
to facilitate s3 access without having to carry around AWS credentials
in environment variables, from where they were prone to escape to log
files posted in public places.

This change paves the way for Impala build and test jobs to use the IAM
roles to access s3 buckets. There are a few minor changes that allow
this to happen:

Changes to the configuration script:
1. bin/impala-config.sh stops setting the AWS_* environment variables
   to dummy default values. When AWS credentials are not supplied in
   the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY,
   these variables are unset (removed from the environment), otherwise
   they would preempt authentication based on the IAM role.
2. Having AWS credentials in the AWS_* environment variables is now
   optional. They are still accepted to allow for private test runs
   accessing private/nondefault buckets with custom credentials.
3. bin/impala-config.sh now checks if credentials are supplied in the
   AWS_* variables or via the IAM role.

Changes to the frontend tests:
1. Some front-end tests still referenced the old s3 connector s3n:,
   this connector does not support s3 auth via IAM roles. These
   locations are changed to use the newer s3a:, which is the connector
   capable of using IAM roles for authentication and which is used
   in all other code locations.

Changes to the minicluster setup:
1. As a corollary the s3n: configuration sections are removed from
   core-site.xml.tmpl.
2. Remove empty AWS credentials from core-site.xml.tmpl:

   The minicluster setup script susbstitutes values from environment
   variables into Hadoop *-site.xml config files when setting up
   the minicluster runtime environment. The configuration file
   core-site.xml.tmpl contains a section for s3 access, including
   AWS credentials.

   Impala can now use IAM roles for s3 access; this requires the removal
   of environment variables holding AWS credentials, which
   1. breaks the substitution logic in testdata/cluster/admin, and
   2. would break the IAM-based credentials if empty credentials were
      supplied in core-site.xml

   The fix for all of the above issues is to remove the AWS credential
   settings from the generated core-site.xml if both AWS_ACCESS_KEY_ID and
   AWS_SECRET_ACCESS_KEY environment variables are absent or empty.

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
---
M bin/impala-config.sh
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/cluster/admin
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
5 files changed, 52 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8294/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 5
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Dec 2017 22:10:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8294/3/bin/check-s3-access.sh
File bin/check-s3-access.sh:

http://gerrit.cloudera.org:8080/#/c/8294/3/bin/check-s3-access.sh@56
PS3, Line 56: 
> you don't need the semicolon
Done


http://gerrit.cloudera.org:8080/#/c/8294/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3001
PS3, Line 3001: ype 
This test is removed as the s3n: URL could not be changed to s3a:, that would have defeated the goal of the test. Coverage for invalid, but still parseable URLs is still provided by the remaining test step checking 'file:///' URLs.


http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@59
PS3, Line 59:  <property>
> Is there any reason not to s/s3a/s3n/ in the tests so that we can get rid o
Done.

The credentials section is now completely removed (it can still be added manually if someone really needs it); dummy credentials required by the FE tests are supplied via environment variables. s3a: can take this form; s3n:, which can only understand the entries in core-site.xml has been removed from the codebase.

I had to remove a single test from AnalyzeStmtsTest.java so that s3n: could be completely removed; I left some notes there.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Nov 2017 14:01:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@a3000
PS4, Line 3000: 
> Can we restore this?  LOAD DATA INPATH doesn't get coverage in the other te
I'm facing a problem with this one.
Note, that the original test here is a negative one: it verifies that LOAD DATA INPATH rejects an s3n: filesystem reference.
With s3n dummy credentials removed from the minicluster configuration and Hadoop3 removing the s3n provider completely the negative test will still fail, but with a different error/exception, and the test would not be relevant any more.

I tried to change it to a positive test using AnalyzesOk() and an s3a: reference. The problem there was that the s3a: provider seems to insist on hitting the S3 object URI encoded in the test. That means that a successful test step needs an existing S3 bucket and object ("file") and valid credentials to access the file. I think it would be problematic if FE unit tests suddenly started requiring valid S3 credentials.
I could make the S3 bucket world-readable, but it still feels awkward.

Also, the end2end tests actually test various LOAD DATA INPATH scenarios, e.g. https://github.com/cloudera/Impala/blob/cdh5-trunk/testdata/workloads/functional-query/queries/QueryTest/load.test#L35
These tests use the FILESYSTEM_PREFIX environment variable, so they run against S3 (specifically s3a:) when the tests are run on S3.

Would you be comfortable with disabling this test step and following it up in a separate Jira?



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Dec 2017 23:38:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Michael Brown, Jim Apple, Philip Zeyliger, Sailesh Mukil, David Knupp, Joe McDonnell, Tim Armstrong, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8294

to look at the new patch set (#3).

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

For some time Impala in a production environment has been able
to access data stored in Amazon S3 buckets using credentials specified
in a number of ways:
- storing Amazon access keys in environment variables or
  in core-site.xml.
- using proprietary management tools to store Amazon access keys
  securely
- using Amazon IAM roles bound to VMs running in EC2.

The development minicluster environment used the first approach,
which risked leaking these keys.

This change enables Impala builds to use IAM
roles to access S3 buckets when running on an Amazon EC2 virtual
machine. The changes mainly ensure that environment variables and/or Jenkins
parameters carrying the traditional AWS credentials do not conflict with
credentials supplied by the IAM role attached to the VM instance.

The change also moves the logic performing the S3 access checks into a separate
script file: bin/check-s3-access.sh.

IAM role based credentials are accessible through the EC2
instance-property mechanism; for further details see Amazon's docs at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

Changes to the configuration script:
1. bin/impala-config.sh stops setting the AWS_* environment variables
   to dummy default values. When AWS credentials are not supplied in
   the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY,
   these variables are unset (removed from the environment), otherwise
   they would interfere with authentication based on the IAM role.
2. Having AWS credentials in the AWS_* environment variables is now
   optional. They are still accepted to allow for private test runs
   accessing private/nondefault buckets with custom credentials.
3. bin/impala-config.sh now calls bin/check-s3-access.sh to perform the actual
   S3-dependent checks. check-s3-access.sh contains the S3-specific logic and
   network access needed to check if the requested S3 bucket is accessible
   for the build.

Changes to the minicluster configuration:
1. Security credentials for the s3n: connector, located in core-site.xml
   are no longer replaced with actual AWS_ credentials when configuring
   the minicluster. These parameters are used for some front-end tests,
   which don't actually reach out to S3, the s3n: notation just simulates
   non-HDFS storage.
   For these tests to work s3n: authentication parameters still need to
   exist in core-site.xml. Their values do not matter, so the configuration
   template now has fixed dummy values for these parameters.

2. Remove empty s3a: security parameter sections from core-site.xml:
   The testdata/cluster/admin setup script substitutes values from
   environment variables into core-site.xml when it sets up the minicluster
   runtime environment.

   The configuration section for s3a: credentials is now completely
   removed if both of the following conditions are met:
   - the target filesystem is set to "s3"
   - the AWS credential environment variables AWS_ACCESS_KEY_ID
     and AWS_SECRET_ACCESS_KEY are both empty or missing.

   The configuration file core-site.xml.tmpl is extended with
   comment markers that delimit the section to be removed in this case.

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
---
A bin/check-s3-access.sh
M bin/impala-config.sh
M testdata/cluster/admin
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
4 files changed, 163 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8294/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@240
PS1, Line 240: #export AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY-}"
             : #export AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID-}"
Self-inflicted review finding: remove commented-out lines.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Oct 2017 19:45:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 3: Code-Review+1

Thanks for humouring me :).


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 23:52:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294:   if (set +x; [[ -z ${AWS_ACCESS_KEY_ID-} && -z ${AWS_SECRET_ACCESS_KEY-} ]]); then
> I'm not sure that putting the checks into a separate script would change an
Lines 252-264 seem ok - I don't have a problem with setting environment variables in this script.


http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294:   if (set +x; [[ -z ${AWS_ACCESS_KEY_ID-} && -z ${AWS_SECRET_ACCESS_KEY-} ]]); then
> In most of the use cases the network access is avoided, it happens only if 
Yeah, I'd question whether "aws ls" belongs in this script either.

The performance is one thing but I just generally think we should restrict this script to doing the minimum possible to set up environment variables. I don't see why it should be extended to do heavyweight things to validate the configuration - we don't do anything to validate the vast majority of other variables that are set in this script.

It does appear that people have added validations in an ad-hoc way before so if a majority of people think that that's a good idea, I can yield to that.

We should also keep in mind that this script is a crappy programming environment, because it's running in the context of the user's shell and we can't use things like "set -x" and have to be careful setting variables that we don't intend to leak into the user's shell.

So I think regardless this logic would be more maintainable in a separate script to validate the AWS config. My preference is that we also run that script from elsewhere to keep impala-config.sh lightweight but if other people feel strongly that impala-config.sh should be doing more validation of configs, etc then that's not the worst thing in the world.


http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@303
PS2, Line 303: CURL_ARGS
This variable will leak out into the user's shell.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Nov 2017 02:21:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256
PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then
> We may want to include a check here that "set -x" has not been enabled?
(Bash hackery)

If you want, you can use a subshell:

$(set -x; (set +x; [ $USER == philip ])) && echo yes || echo no
+ set +x
yes
10:46:02 mystery  ~
$(set -x; (set +x; [ $USER == phili ])) && echo yes || echo no
+ set +x
no


Return values survive, and you can just do

if (set +x; [[ -z ${AWS_SECRET_ACCESS_KEY_ID-} ]]); then
  ...
fi

I tested something similar above.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:47:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Michael Brown, Jim Apple, Philip Zeyliger, Sailesh Mukil, David Knupp, Joe McDonnell, Tim Armstrong, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8294

to look at the new patch set (#4).

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

For some time Impala in a production environment has been able
to access data stored in Amazon S3 buckets using credentials specified
in a number of ways:
- storing Amazon access keys in environment variables or
  in core-site.xml.
- using proprietary management tools to store Amazon access keys
  securely
- using Amazon IAM roles bound to VMs running in EC2.

The development minicluster environment used the first approach,
which risked leaking these keys.

This change enables Impala builds to use IAM
roles to access S3 buckets when running on an Amazon EC2 virtual
machine. The changes mainly ensure that environment variables carrying
the traditional AWS credentials do not conflict with credentials supplied
by the IAM role attached to the VM instance.

IAM role based credentials are accessible through the EC2
instance-property mechanism; for further details see Amazon's docs at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

The change also removes the remaining references to the s3n: provider.
In the FE tests all URIs referring to s3n: are replaced with their
s3a: equivalents, except for a single negative test in
AnalyzeStmtsTest.java, which is removed.

In addition to the code changes, the s3n: and s3a: credential properties
are also removed from core-site.xml.tmpl. The s3a: provider can pick up
AWS S3 credentials from environment variables or IAM properties bound
to the VM instance, which is a more flexible approach.

As environment variables have precedence over IAM roles, care must be
taken when managing the canonical environment variables carrying
AWS credentials. There are two requirements to be reconciled:
1. The FE tests have code that examines s3a: URIs; this code needs
   existing, but not necessarily valid AWS credentials.
2. When the Impala test suite is executed on an EC2 VM, AWS credentials
   can be supplied via IAM roles. These credentials can be used only
   if the AWS_* environment variables are unset (do not exist).

The tradeoff is managed following these rules:
1. When AWS_* environment variables are set before invoking the
   Impala configuration scripts, their value is preserved and
   the config scripts ensure that the variables are exported.
2. If the AWS_* variables are missing or empty, they will be unset
   to ensure that credentials supplied by Amazon's IAM roles can be
   accessed,
3. except if the scripts are running outside of EC2 (so there can be
   no IAM roles) and TARGET_FILESYSTEM is not set "s3". This combination
   is most often the case on a developer's local workstation.
   In this case the AWS_* credential variables are forcibly set to
   dummy values to allow the FE tests to succeed.
   The removal of S3 credential parameters from core-site.xml[.tmpl]
   also allows users to set up their own credentials there,
   the config scripts will not change those settings.

Environment variables carrying AWS security credentials will be set
up according to the following table:

    Instance:     Running outside EC2 ||  Running in EC2 |
--------------------+--------+--------++--------+--------+
  TARGET_FILESYSTEM |   S3   | not S3 ||   S3   | not S3 |
--------------------+--------+--------++--------+--------+
                    |        |        ||        |        |
              empty | unset  | dummy  ||  unset |  unset |
AWS_*               |        |        ||        |        |
env   --------------+--------+--------++--------+--------+
var                 |        |        ||        |        |
          not empty | export | export || export | export |
                    |        |        ||        |        |
--------------------+--------+--------++--------+--------+

Legend: unset:  the variable is unset
        export: the variable is exported with its current value
        dummy:  the variable is set to a preset dummy value and
                exported

Running on an EC2 VM is indicated by setting RUNNING_IN_EC2 to "true" and
exporting it before impala_config.sh is invoked.

The change also moves the logic performing the S3 access checks into a separate
script file: bin/check-s3-access.sh. This file now contains all the S3-specific
logic and network access to check if the requested S3 bucket can be accessed.

Testing:

Performed local builds for HDFS as well as automated builds against
HDFS and S3, using both IAM roles and explicit AWS_* credentials for
authentication.
Verified that FE tests that parse s3a: URLs are still successful in
all these combinations (when they are run).

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
---
A bin/check-s3-access.sh
M bin/impala-config.sh
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
5 files changed, 170 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8294/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: 169.254.169.254
What is this address? Can we use a domain name and https?

What would a new Impala developer get as a result of this?



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Oct 2017 20:55:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

Hi Laszlo,

We're interested in this change for avoiding some compatibility problems with Hadoop 3. Any news here?

Thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Oct 2017 23:03:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

For some time Impala in a production environment has been able
to access data stored in Amazon S3 buckets using credentials specified
in a number of ways:
- storing Amazon access keys in environment variables or
  in core-site.xml.
- using proprietary management tools to store Amazon access keys
  securely
- using Amazon IAM roles bound to VMs running in EC2.

The development minicluster environment used the first approach,
which risked leaking these keys.

This change enables Impala builds to use IAM
roles to access S3 buckets when running on an Amazon EC2 virtual
machine. The changes mainly ensure that environment variables carrying
the traditional AWS credentials do not conflict with credentials supplied
by the IAM role attached to the VM instance.

IAM role based credentials are accessible through the EC2
instance-property mechanism; for further details see Amazon's docs at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

The change also removes the remaining references to the s3n: provider.
In the FE tests all URIs referring to s3n: are replaced with their
s3a: equivalents, except for a single negative test in
AnalyzeStmtsTest.java, which is removed.

In addition to the code changes, the s3n: and s3a: credential properties
are also removed from core-site.xml.tmpl. The s3a: provider can pick up
AWS S3 credentials from environment variables or IAM properties bound
to the VM instance, which is a more flexible approach.

As environment variables have precedence over IAM roles, care must be
taken when managing the canonical environment variables carrying
AWS credentials. There are two requirements to be reconciled:
1. The FE tests have code that examines s3a: URIs; this code needs
   existing, but not necessarily valid AWS credentials.
2. When the Impala test suite is executed on an EC2 VM, AWS credentials
   can be supplied via IAM roles. These credentials can be used only
   if the AWS_* environment variables are unset (do not exist).

The tradeoff is managed following these rules:
1. When AWS_* environment variables are set before invoking the
   Impala configuration scripts, their value is preserved and
   the config scripts ensure that the variables are exported.
2. If the AWS_* variables are missing or empty, they will be unset
   to ensure that credentials supplied by Amazon's IAM roles can be
   accessed,
3. except if the scripts are running outside of EC2 (so there can be
   no IAM roles) and TARGET_FILESYSTEM is not set "s3". This combination
   is most often the case on a developer's local workstation.
   In this case the AWS_* credential variables are forcibly set to
   dummy values to allow the FE tests to succeed.
   The removal of S3 credential parameters from core-site.xml[.tmpl]
   also allows users to set up their own credentials there,
   the config scripts will not change those settings.

Environment variables carrying AWS security credentials will be set
up according to the following table:

    Instance:     Running outside EC2 ||  Running in EC2 |
--------------------+--------+--------++--------+--------+
  TARGET_FILESYSTEM |   S3   | not S3 ||   S3   | not S3 |
--------------------+--------+--------++--------+--------+
                    |        |        ||        |        |
              empty | unset  | dummy  ||  unset |  unset |
AWS_*               |        |        ||        |        |
env   --------------+--------+--------++--------+--------+
var                 |        |        ||        |        |
          not empty | export | export || export | export |
                    |        |        ||        |        |
--------------------+--------+--------++--------+--------+

Legend: unset:  the variable is unset
        export: the variable is exported with its current value
        dummy:  the variable is set to a preset dummy value and
                exported

Running on an EC2 VM is indicated by setting RUNNING_IN_EC2 to "true" and
exporting it before impala_config.sh is invoked.

The change also moves the logic performing the S3 access checks into a separate
script file: bin/check-s3-access.sh. This file now contains all the S3-specific
logic and network access to check if the requested S3 bucket can be accessed.

Testing:

Performed local builds for HDFS as well as automated builds against
HDFS and S3, using both IAM roles and explicit AWS_* credentials for
authentication.
Verified that FE tests that parse s3a: URLs are still successful in
all these combinations (when they are run).

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Reviewed-on: http://gerrit.cloudera.org:8080/8294
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Reviewed-by: Zach Amsden <za...@cloudera.com>
Tested-by: Impala Public Jenkins
---
A bin/check-s3-access.sh
M bin/impala-config.sh
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
5 files changed, 177 insertions(+), 53 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, but someone else must approve
  Zach Amsden: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 6
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/check-s3-access.sh
File bin/check-s3-access.sh:

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/check-s3-access.sh@82
PS4, Line 82:   # an EC2 VM) or the IAM role cannot be retrieved.
> Please make this IP address a shell variable and set it above right after y
Done

Also added a brief description of the instance metadata access mechanism and a link to the relevant AWS doc page.


http://gerrit.cloudera.org:8080/#/c/8294/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/impala-config.sh@345
PS4, Line 345: if (set +x; [[ -n ${AWS_SESSION_TOKEN-} ]]); then
> It's confusing that this is backwards from line 334 and 324. i.e., in the f
Done


http://gerrit.cloudera.org:8080/#/c/8294/4/bin/impala-config.sh@353
PS4, Line 353:     export DEFAULT_FS="s3a://${S3_BUCKET}"
             :     export FILESYSTEM
> nit:
Done


http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@a3000
PS4, Line 3000: 
> If we believe this is tested in the e2e tests, I think it's fine to remove 
For the record: I ended up removing it after running load.test both for S3 and HDFS.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 5
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Dec 2017 17:00:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> >[...]Please do answer the question about changing
>  > the FE tests to just use s3a so that we can remove the hackery
>  > altogether.[...]
> 
> In short: it depends on your end goal.
> If the end goal is to get rid of the s3n: provider and its supporting menagerie (jets3t, etc.), then yes, it can be done with a small price: a single negative test in AnalyzeStmtsTest.java will have to go. The code there in https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L3000 performs a test step that should fail for s3n -- but if I remove the s3n properties from core-site.xml, this line blows up complaining about missing credentials. The stack trace reveals that Impala's HdfsUri::analyze() calls into Hadoop's Path::getFileSystem() to find out what kind of filesystem the URI points to; this in turn calls down all the way to org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74) being called from Jets3t.
> 
> If your end goal is to remove dummy credentials from core-site.xml.tmpl, this will probably not be complete success: the s3a: provider behaves in a similar () though not identical) way. When the front-end tests are run without an S3 container in sight (e.g. your local dev box with no S3 access set up), AnalyzeDDLTest's s3a: URIs still require similar dummy credentials, but with s3a: these can be supplied in environment variables. This means that core-site.xml[.tmpl] could be cleaned up, but dummy credentials in envireonment variables would still be there.

Hadoop has essentially deprecated s3n and it's removed entirely in Hadoop-3.0. (See HADOOP-14738.) The distribution of Hadoop we're depending on also prefers s3a. I think we can do away with s3n entirely, but I'm happy to take that on as a separate change after yours here.


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Nov 2017 21:25:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@a3000
PS4, Line 3000: 
> I'm facing a problem with this one.
If we believe this is tested in the e2e tests, I think it's fine to remove this here.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Dec 2017 04:46:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@a3000
PS4, Line 3000: 
> For the record: I ended up removing it after running load.test both for S3 
This seems reasonable.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Dec 2017 22:10:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256
PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then
We may want to include a check here that "set -x" has not been enabled?

# Prevent leaking the AWS keys to the log
set_x=0
if set +o | grep -q "set -o xtrace"; then
  set_x=1
  set +x
fi

DO STUFF

# Restore xtrace flag
if [[ $set_x -eq 1 ]]; then
  set -x
fi



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:32:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8294/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8294/1//COMMIT_MSG@9
PS1, Line 9: JENKINS-1102 added the IAM role ImpalaDev to the Impala Jenkins workers
           : to facilitate s3 access without having to carry around AWS credentials
           : in environment variables, from where they were prone to escape to log
           : files posted in public places.
This looks like something specific to a downstream environment and should be removed from the commit message.


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@268
PS1, Line 268: test
tests


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: curl -sf --connect-timeout 1 --max-time 5
Are these curl options common across a wide variety of OSs? If they are newer and only work under, say, Ubuntu 16, they would be a problem for contributors using older distributions.


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: http://169.254.169.254/latest/meta-data/iam/security-credentials/
Is there a AWS reference page you can add as a comment so I can read up more on this?


http://gerrit.cloudera.org:8080/#/c/8294/1/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/8294/1/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@61
PS1, Line 61: <!-- BEGIN s3 security settings -->
Maybe have a comment explaining testdata/cluster/admin needs this (and the END) as a marker and not to remove?



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Oct 2017 23:09:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/check-s3-access.sh
File bin/check-s3-access.sh:

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/check-s3-access.sh@82
PS4, Line 82:   WGET_ARGS+=(http://169.254.169.254/latest/meta-data/iam/security-credentials/)
Please make this IP address a shell variable and set it above right after you reference the docs.aws.amazon.com link.


http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@a3000
PS4, Line 3000: 
Can we restore this?  LOAD DATA INPATH doesn't get coverage in the other test, and needs to be able to take S3 URIs.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Dec 2017 01:09:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 4: Code-Review+1

(2 comments)

Works for me. Thanks! The two nits below are very minor.

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/4/bin/impala-config.sh@345
PS4, Line 345: if (set +x; [[ -z ${AWS_SESSION_TOKEN-} ]]); then
It's confusing that this is backwards from line 334 and 324. i.e., in the first two cases, the first stanza was "if set, export", but here it's "if unset ... else export". I'd prefer you invert the if, but am happy to not review that again.


http://gerrit.cloudera.org:8080/#/c/8294/4/bin/impala-config.sh@353
PS4, Line 353:     DEFAULT_FS="s3a://${S3_BUCKET}"
             :     export DEFAULT_FS
nit:

export DEFAULT_FS=...

would work. You do that in lines 330 and 340, so it's consistent.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 4
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Dec 2017 22:55:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Michael Brown, Jim Apple, Philip Zeyliger, Sailesh Mukil, David Knupp, Joe McDonnell, Tim Armstrong, Alex Behm, Zach Amsden, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8294

to look at the new patch set (#5).

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

For some time Impala in a production environment has been able
to access data stored in Amazon S3 buckets using credentials specified
in a number of ways:
- storing Amazon access keys in environment variables or
  in core-site.xml.
- using proprietary management tools to store Amazon access keys
  securely
- using Amazon IAM roles bound to VMs running in EC2.

The development minicluster environment used the first approach,
which risked leaking these keys.

This change enables Impala builds to use IAM
roles to access S3 buckets when running on an Amazon EC2 virtual
machine. The changes mainly ensure that environment variables carrying
the traditional AWS credentials do not conflict with credentials supplied
by the IAM role attached to the VM instance.

IAM role based credentials are accessible through the EC2
instance-property mechanism; for further details see Amazon's docs at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

The change also removes the remaining references to the s3n: provider.
In the FE tests all URIs referring to s3n: are replaced with their
s3a: equivalents, except for a single negative test in
AnalyzeStmtsTest.java, which is removed.

In addition to the code changes, the s3n: and s3a: credential properties
are also removed from core-site.xml.tmpl. The s3a: provider can pick up
AWS S3 credentials from environment variables or IAM properties bound
to the VM instance, which is a more flexible approach.

As environment variables have precedence over IAM roles, care must be
taken when managing the canonical environment variables carrying
AWS credentials. There are two requirements to be reconciled:
1. The FE tests have code that examines s3a: URIs; this code needs
   existing, but not necessarily valid AWS credentials.
2. When the Impala test suite is executed on an EC2 VM, AWS credentials
   can be supplied via IAM roles. These credentials can be used only
   if the AWS_* environment variables are unset (do not exist).

The tradeoff is managed following these rules:
1. When AWS_* environment variables are set before invoking the
   Impala configuration scripts, their value is preserved and
   the config scripts ensure that the variables are exported.
2. If the AWS_* variables are missing or empty, they will be unset
   to ensure that credentials supplied by Amazon's IAM roles can be
   accessed,
3. except if the scripts are running outside of EC2 (so there can be
   no IAM roles) and TARGET_FILESYSTEM is not set "s3". This combination
   is most often the case on a developer's local workstation.
   In this case the AWS_* credential variables are forcibly set to
   dummy values to allow the FE tests to succeed.
   The removal of S3 credential parameters from core-site.xml[.tmpl]
   also allows users to set up their own credentials there,
   the config scripts will not change those settings.

Environment variables carrying AWS security credentials will be set
up according to the following table:

    Instance:     Running outside EC2 ||  Running in EC2 |
--------------------+--------+--------++--------+--------+
  TARGET_FILESYSTEM |   S3   | not S3 ||   S3   | not S3 |
--------------------+--------+--------++--------+--------+
                    |        |        ||        |        |
              empty | unset  | dummy  ||  unset |  unset |
AWS_*               |        |        ||        |        |
env   --------------+--------+--------++--------+--------+
var                 |        |        ||        |        |
          not empty | export | export || export | export |
                    |        |        ||        |        |
--------------------+--------+--------++--------+--------+

Legend: unset:  the variable is unset
        export: the variable is exported with its current value
        dummy:  the variable is set to a preset dummy value and
                exported

Running on an EC2 VM is indicated by setting RUNNING_IN_EC2 to "true" and
exporting it before impala_config.sh is invoked.

The change also moves the logic performing the S3 access checks into a separate
script file: bin/check-s3-access.sh. This file now contains all the S3-specific
logic and network access to check if the requested S3 bucket can be accessed.

Testing:

Performed local builds for HDFS as well as automated builds against
HDFS and S3, using both IAM roles and explicit AWS_* credentials for
authentication.
Verified that FE tests that parse s3a: URLs are still successful in
all these combinations (when they are run).

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
---
A bin/check-s3-access.sh
M bin/impala-config.sh
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
5 files changed, 177 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8294/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 5
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3000
PS1, Line 3000:       AnalysisError(String.format("load data inpath '%s' %s into table " +
              :           "tpch.lineitem", "s3a://bucket/test-warehouse/test.out", overwrite),
              :           "INPATH location 's3a://bucket/test-warehouse/test.out' must point to an " +
              :           "HDFS, S3A or ADL filesystem.");
Impala cannot load data from s3n. I think this test is intended to verify our error message when given an s3n path, so I don't think an s3a path will work here. The source of the error is LoadDataStmt::analyzePaths().



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Oct 2017 23:30:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 5
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Dec 2017 00:24:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: 169.254.169.254
> What is this address? Can we use a domain name and https?
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

This is EC2 magic. If you're not on EC2, this will tell you to set the environment variables (if you're testing S3). If you're on EC2, this will return something. We don't actually check that those credentials can access the relevant bucket, but at least it lets you through.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:06:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@240
PS1, Line 240: #export AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY-}"
             : #export AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID-}"
> Self-inflicted review finding: remove commented-out lines.
Done


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256
PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then
> (Bash hackery)
Thanks for the tip, Philip, I ended up using it.


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@268
PS1, Line 268: test
> tests
Done


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: curl -sf --connect-timeout 1 --max-time 5
> Are these curl options common across a wide variety of OSs? If they are new
Yes. I have tested the curl options on all the platforms on which the packaging build runs. Curl understands these options on all these platforms.


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: http://169.254.169.254/latest/meta-data/iam/security-credentials/
> Is there a AWS reference page you can add as a comment so I can read up mor
Done


http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: 169.254.169.254
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.ht
I have put the URL to the Amazon doc page where this mechanism is described.


http://gerrit.cloudera.org:8080/#/c/8294/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8294/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3000
PS1, Line 3000:       AnalysisError(String.format("load data inpath '%s' %s into table " +
              :           "tpch.lineitem", "s3a://bucket/test-warehouse/test.out", overwrite),
              :           "INPATH location 's3a://bucket/test-warehouse/test.out' must point to an " +
              :           "HDFS, S3A or ADL filesystem.");
> Impala cannot load data from s3n. I think this test is intended to verify o
That's right; I was pattern-matching too eagerly. Thanks for catching this.
I have in fact reverted all the s3n->s3a changes in the FE tests; looks like they don't in fact touch S3 and they are OK with some existing but obviously fake credentials in core-site.xml.


http://gerrit.cloudera.org:8080/#/c/8294/1/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/8294/1/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@61
PS1, Line 61: <!-- BEGIN s3 security settings -->
> Maybe have a comment explaining testdata/cluster/admin needs this (and the 
Done,

referred the Amazon doc site as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 15:24:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 17:05:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Michael Brown, Jim Apple, Philip Zeyliger, Sailesh Mukil, David Knupp, Joe McDonnell, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8294

to look at the new patch set (#2).

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

For some time Impala in a production environment has been able
to access data stored in Amazon S3 buckets using credentials specified
in a number of ways:
- storing Amazon access keys in environment variables or
  in core-site.xml.
- using proprietary management tools to store Amazon access keys
  securely
- using Amazon IAM roles bound to VMs running in EC2.

The development minicluster environment used the first approach,
which risked leaking these keys.

This change paves the way for Impala development setups to use IAM
roles to access S3 buckets when running on an Amazon EC2 virtual
machine. The changes mainly ensure that traditional credentials
supplied in environment variables do not conflict with credentials
supplied by the IAM role attached to the VM instance.
The IAM role based credentials are accessible through the EC2
instance-property mechanism; for further details see Amazon's docs at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

Changes to the configuration script:
1. bin/impala-config.sh stops setting the AWS_* environment variables
   to dummy default values. When AWS credentials are not supplied in
   the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY,
   these variables are unset (removed from the environment), otherwise
   they would preempt authentication based on the IAM role.
2. Having AWS credentials in the AWS_* environment variables is now
   optional. They are still accepted to allow for private test runs
   accessing private/nondefault buckets with custom credentials.
3. bin/impala-config.sh now checks if credentials are supplied in the
   AWS_* variables or via the IAM role.

Changes to the minicluster configuration:
1. Some front-end tests still refer to the old S3 connector s3n:.
   This connector does not support s3 auth via IAM roles, but this
   is not a problem: these front-end tests don't actually reach out
   to S3, the s3n: notation is just for the FE.
   For these tests to work authentication parameters still need to
   exist for the s3n: connector in core-site.xml, but the values do
   not matter, so the configuration template now has fixed dummy
   values for the s3n: AWS credentials.

2. Remove empty AWS credentials from core-site.xml.tmpl:
   The testdata/cluster/admin setup script substitutes values from
   environment variables into Hadoop *-site.xml configuration files
   when setting up the minicluster runtime environment.

   The configuration section for s3a: credentials are now completely
   removed if:
   - the target filesystem is set to "s3"
   - and the AWS credential environment variables AWS_ACCESS_KEY_ID
     and AWS_SECRET_ACCESS_KEY are both empty or missing.

   The configuration file core-site.xml.tmpl was extended with
   comment markers that delimit the section to be removed in this case.

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
---
M bin/impala-config.sh
M testdata/cluster/admin
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
3 files changed, 92 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8294/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1601/


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 5
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Dec 2017 22:11:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 5
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Comment-Date: Sat, 09 Dec 2017 01:42:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294:   if (set +x; [[ -z ${AWS_ACCESS_KEY_ID-} && -z ${AWS_SECRET_ACCESS_KEY-} ]]); then
> (I think it's fine if this is put in a separate script and called from buil
I'm not sure that putting the checks into a separate script would change anything. The part in lines 252-264 have to remain here because they touch environment variables that are used further in this script.


http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294:   if (set +x; [[ -z ${AWS_ACCESS_KEY_ID-} && -z ${AWS_SECRET_ACCESS_KEY-} ]]); then
> Does impala-config.sh really have to talk to the internet? It just tends to
In most of the use cases the network access is avoided, it happens only if the build is set up to run the tests on S3.  -- and talking to S3 means lots of network access anyway.
Even the current version of the script performs a network access in the S3 case: line 321 uses AWS CLI to check the existence of the specified bucket.

Checking the credential URL is gated by the following environment variables:
- TARGET_FILESYSTEM has to be set to "s3"
- S3_BUCKET has to be non-empty
- AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID both have to be empty or missing
before the script attempts to look for credentials by checking the URL.

Even if a network check is performed, it will not go out to the internet. The specific network address belongs to the "link-local address" category (see https://en.wikipedia.org/wiki/Link-local_address), which is valid only in the immediate network neighborhood.
Within EC2 the request should be server quickly; outside of EC2 I don't expect the target URL to exist at all. Based on this I could set short timeouts, so that the call fails quickly even in rare failure cases.


http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@307
PS2, Line 307:     if ! curl "${CURL_ARGS[@]}" ; then
> curl is not a development dependency in bin/bootstrap_system.sh. I'd sugges
Good point; I'll check if wget can be set up the same way.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 21:40:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294:   if (set +x; [[ -z ${AWS_ACCESS_KEY_ID-} && -z ${AWS_SECRET_ACCESS_KEY-} ]]); then
> Does impala-config.sh really have to talk to the internet? It just tends to
(I think it's fine if this is put in a separate script and called from buildall.sh or testdata/bin/run-all.sh or something like that)



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 18:01:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 3:

(3 comments)

> I also don't want to block progress on making this good improvement
 > to our S3 development, but my fear is that this script will get
 > more and more out of control if we don't draw a line somewhere
 > about what belongs in it.

That's a valid point. I moved the logic to bin/check-s3-access.sh, which actually lets us reuse the the same check elsewhere if it becomes necessary or useful.

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294: export LOCAL_FS="file:${WAREHOUSE_LOCATION_PREFIX}"
> Yeah, I'd question whether "aws ls" belongs in this script either.
Done, moved to bin/check-s3-access.sh.


http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@303
PS2, Line 303: set AWS_A
> This variable will leak out into the user's shell.
Done, I moved these checks into a separate script, check-s3-access.sh


http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@307
PS2, Line 307: 
> Good point; I'll check if wget can be set up the same way.
Done. Although the check was moved to bin/check-s3-access.sh, I have replaced curl with wget, using equivalent parameters for silencing and short timeouts.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:26:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2: Code-Review-1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@294
PS2, Line 294:   if (set +x; [[ -z ${AWS_ACCESS_KEY_ID-} && -z ${AWS_SECRET_ACCESS_KEY-} ]]); then
Does impala-config.sh really have to talk to the internet? It just tends to cause problems if impala-config.sh does things aside from setting environment variables.



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 18:01:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/2/bin/impala-config.sh@307
PS2, Line 307:     if ! curl "${CURL_ARGS[@]}" ; then
curl is not a development dependency in bin/bootstrap_system.sh. I'd suggest using wget or adding curl to the apt-get install list in bin/bootstrap_system.sh



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 18:20:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2:

> Hi Laszlo,
 > 
 > We're interested in this change for avoiding some compatibility
 > problems with Hadoop 3. Any news here?
 > 
 > Thanks!

Hi Philip,

Sorry for the late update on this patch, I was detoured by other, more timeboxed activities. I have just posted a new patch set and I'm looking forward to your comments on it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 15:27:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 2:

I also don't want to block progress on making this good improvement to our S3 development, but my fear is that this script will get more and more out of control if we don't draw a line somewhere about what belongs in it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 2
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Nov 2017 02:22:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 3: Code-Review+1

(3 comments)

I'm fine with this as-is; there are very minor nits that you can tackle if you'd like. Please do answer the question about changing the FE tests to just use s3a so that we can remove the hackery altogether.

Thanks!

http://gerrit.cloudera.org:8080/#/c/8294/3/bin/check-s3-access.sh
File bin/check-s3-access.sh:

http://gerrit.cloudera.org:8080/#/c/8294/3/bin/check-s3-access.sh@56
PS3, Line 56:   exit 0;
you don't need the semicolon


http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/admin
File testdata/cluster/admin:

http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/admin@289
PS3, Line 289:         sed '/<!-- BEGIN s3a security settings/,/END s3a security settings -->/d' \
You could use "sed -i" to edit this in place, though I know Mac OS X sed behaves slightly differently than Linux sed, causing some trouble. We seem to have some limited "sed -i" usage already.

I'm not insisting.


http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@59
PS3, Line 59:   <!-- Fake s3n credentials required for FE tests -->
Is there any reason not to s/s3a/s3n/ in the tests so that we can get rid of this?



-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 19:59:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................


Patch Set 3:

>[...]Please do answer the question about changing
 > the FE tests to just use s3a so that we can remove the hackery
 > altogether.[...]

In short: it depends on your end goal.
If the end goal is to get rid of the s3n: provider and its supporting menagerie (jets3t, etc.), then yes, it can be done with a small price: a single negative test in AnalyzeStmtsTest.java will have to go. The code there in https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L3000 performs a test step that should fail for s3n -- but if I remove the s3n properties from core-site.xml, this line blows up complaining about missing credentials. The stack trace reveals that Impala's HdfsUri::analyze() calls into Hadoop's Path::getFileSystem() to find out what kind of filesystem the URI points to; this in turn calls down all the way to org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74) being called from Jets3t.

If your end goal is to remove dummy credentials from core-site.xml.tmpl, this will probably not be complete success: the s3a: provider behaves in a similar () though not identical) way. When the front-end tests are run without an S3 container in sight (e.g. your local dev box with no S3 access set up), AnalyzeDDLTest's s3a: URIs still require similar dummy credentials, but with s3a: these can be supplied in environment variables. This means that core-site.xml[.tmpl] could be cleaned up, but dummy credentials in envireonment variables would still be there.


-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Nov 2017 21:04:24 +0000
Gerrit-HasComments: No