You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/20 05:22:40 UTC

[GitHub] [arrow] corleyma opened a new pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

corleyma opened a new pull request #7803:
URL: https://github.com/apache/arrow/pull/7803


   ### Background
   AWS provides a mechanism for using temporary credentials to access AWS resources.  When accessing AWS resources with a set of temporary credentials,[ users must provide a session token](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html) in addition to the usual access key id and secret access key.
   
   This PR adds support for providing a session token when initializing an S3FileSystem.
   
   ### Changes
   #### C++
   - updated `S3Options.FromAccessKey` and `S3Options.ConfigureAccessKey` to accept an optional `session_token` argument (defaulting to empty string, in accordance with the convention of [Aws::Auth::AWSCredentials](https://sdk.amazonaws.com/cpp/api/0.12.9/d4/d27/class_aws_1_1_auth_1_1_a_w_s_credentials.html) as implemented in the AWS C++ SDK.)
   - added `S3Options.GetSessionToken` method
   
   ##### Python
   - updated cdef `CS3Options` to reflect updates to C++ library
   - added optional `session_token` argument to `S3FileSystem.__init__`.
   
   ### Testing
   - updated `test_s3_options` python unittest to include a mock session_token when initializing S3FileSystem.
   - successfully ran ctest suite and pytest suite with `--enable-s3` option.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r461493406



##########
File path: cpp/src/arrow/filesystem/s3fs.h
##########
@@ -62,10 +62,12 @@ struct ARROW_EXPORT S3Options {
   void ConfigureAnonymousCredentials();
 
   /// Configure with explicit access and secret key.
-  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key);
+  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key,
+                          const std::string& session_token = "");
 
   std::string GetAccessKey() const;
   std::string GetSecretKey() const;
+  std::string GetSessionToken() const;

Review comment:
       Please cover the new API with tests on the C++ side.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-672112334


   > @corleyma Can you apply this patch? https://gist.github.com/pitrou/0dc970a2238b9c19b5f8fb991d2fb8f7
   
   Whoops, didn't notice you were playing in this fork too.  I'll quit being destructive and let you guys squash on merge.  Patch applied.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r467821545



##########
File path: python/pyarrow/_s3fs.pyx
##########
@@ -81,9 +81,9 @@ cdef class S3FileSystem(FileSystem):
     cdef:
         CS3FileSystem* s3fs
 
-    def __init__(self, *, access_key=None, secret_key=None, anonymous=False,
-                 region=None, scheme=None, endpoint_override=None,
-                 bint background_writes=True):
+    def __init__(self, *, access_key=None, secret_key=None, session_token=None,

Review comment:
       updated docstring accordingly, thanks for catching that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r468174939



##########
File path: cpp/src/arrow/filesystem/s3fs_test.cc
##########
@@ -197,6 +197,32 @@ TEST(S3Options, FromUri) {
   ASSERT_RAISES(Invalid, S3Options::FromUri("s3:///foo/bar/", &path));
 }
 
+TEST(S3Options, FromAccessKey) {
+  S3Options options;
+
+  // session token is optional and should default to empty string
+  options = S3Options::FromAccessKey("access", "secret");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "");
+
+  options = S3Options::FromAccessKey("access", "secret", "token");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "token");
+}
+
+TEST(S3Options, FromAssumeRole) {
+  S3Options options;
+
+  // the only required argument should be role arn
+  options = S3Options::FromAssumeRole("my_role_arn");

Review comment:
       I added 2 relevant changes:
   - I updated the test to set an env var that will cause DefaultAWSCredentialsProviderChain to skip trying the EC2 instance metadata endpoint
   - I added the ability to provide your own STSClient instead of using the default configuration, so users can choose to use something other than e.g. the DefaultAWSCredentialsProviderChain.
   The test runtime is down from ~5s to ~3ms.

##########
File path: cpp/src/arrow/filesystem/s3fs_test.cc
##########
@@ -197,6 +197,32 @@ TEST(S3Options, FromUri) {
   ASSERT_RAISES(Invalid, S3Options::FromUri("s3:///foo/bar/", &path));
 }
 
+TEST(S3Options, FromAccessKey) {
+  S3Options options;
+
+  // session token is optional and should default to empty string
+  options = S3Options::FromAccessKey("access", "secret");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "");
+
+  options = S3Options::FromAccessKey("access", "secret", "token");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "token");
+}
+
+TEST(S3Options, FromAssumeRole) {
+  S3Options options;
+
+  // the only required argument should be role arn
+  options = S3Options::FromAssumeRole("my_role_arn");

Review comment:
       I added 2 relevant changes:
   - I updated the test to set an env var that will cause DefaultAWSCredentialsProviderChain to skip trying the EC2 instance metadata endpoint
   - I added the ability to provide your own STSClient instead of using the default configuration, so users can choose to use something other than e.g. the DefaultAWSCredentialsProviderChain.
   
   The test runtime is down from ~5s to ~3ms.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671959521


   >  Is there anything I need to do to get the CI systems to pick up the new conda-forge packages?
   
   I don't think so, perhaps there is a CDN in-between that delays the package updates. Let's try again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r468605319



##########
File path: cpp/src/arrow/filesystem/s3fs_test.cc
##########
@@ -197,6 +201,44 @@ TEST(S3Options, FromUri) {
   ASSERT_RAISES(Invalid, S3Options::FromUri("s3:///foo/bar/", &path));
 }
 
+TEST(S3Options, FromAccessKey) {
+  S3Options options;
+
+  // session token is optional and should default to empty string
+  options = S3Options::FromAccessKey("access", "secret");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "");
+
+  options = S3Options::FromAccessKey("access", "secret", "token");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "token");
+}
+
+TEST(S3Options, FromAssumeRole) {
+  S3Options options;
+
+  // we set this envvar to speed up tests by ensuring
+  // DefaultAWSCredentialsProviderChain does not query (inaccessible)
+  // EC2 metadata endpoint
+  ASSERT_OK(SetEnvVar("AWS_EC2_METADATA_DISABLED", "true"));

Review comment:
       So, the right way to do this would be to create a test fixture:
   ```c++
   class S3OptionsTest : public ::testing::Test {
    public:
     void SetUp() {
       ASSERT_OK(SetEnvVar("AWS_EC2_METADATA_DISABLED", "true"));
     }
     void TearDown() {
       ASSERT_OK(DelEnvVar("AWS_EC2_METADATA_DISABLED"));
     }
   };
   ```
   And then to replace every `TEST(S3Options, ...)` with `TEST_F(S3OptionsTest, ...)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-673634727


   @github-actions crossbow submit -g conda


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-660811506


   https://issues.apache.org/jira/browse/ARROW-9517


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-673186485


   @github-actions crossbow submit -g conda


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-673434441


   @corleyma AFAIK, they should have been updated automatically but it looks like they haven't. Will take a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-673736320


   @pitrou I think whatever you did fixed the Github Actions-based CI; I can't seem to relaunch the ursabot/crossbow-based CI tasks (permissions, perhaps?), but I am guessing if those kick off again everything might be green.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-674781958


   @corleyma Unfortunately, the "ursabot" builders use a different execution scheme. The packages need to be updated manually there, I believe. I pinged @kszucs who should be able to handle it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671963588


   @corleyma Can you apply this patch? https://gist.github.com/pitrou/0dc970a2238b9c19b5f8fb991d2fb8f7


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r467821450



##########
File path: python/pyarrow/_s3fs.pyx
##########
@@ -105,9 +105,14 @@ cdef class S3FileSystem(FileSystem):
                 raise ValueError(
                     'Cannot pass anonymous=True together with access_key '
                     'and secret_key.')
+
+            if session_token is None:
+                session_token = ""
+
             options = CS3Options.FromAccessKey(
                 tobytes(access_key),
-                tobytes(secret_key)
+                tobytes(secret_key),
+                tobytes(session_token)

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r465782709



##########
File path: python/pyarrow/_s3fs.pyx
##########
@@ -105,9 +105,14 @@ cdef class S3FileSystem(FileSystem):
                 raise ValueError(
                     'Cannot pass anonymous=True together with access_key '
                     'and secret_key.')
+
+            if session_token is None:
+                session_token = ""
+
             options = CS3Options.FromAccessKey(
                 tobytes(access_key),
-                tobytes(secret_key)
+                tobytes(secret_key),
+                tobytes(session_token)

Review comment:
       If `session_token` is only used when either `access_key` or `secret_key` is given, then an error should be raised when `session_token and access_key is None and secret_key is None`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-664598308


   We're very excited about the 1.0.0 release so no apologies necessary!  
   
   Out of curiosity, what is the likely timeline between merge of this PR and an official release containing these changes?  Is the cadence of one release every few months likely to continue?  Just wondering about whether or not we should be doing our own internal builds.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671961201


   @corleyma Hmm, it seems that by force-pushing you deleted the changes I made yesterday.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r467858318



##########
File path: cpp/src/arrow/filesystem/s3fs_test.cc
##########
@@ -197,6 +197,32 @@ TEST(S3Options, FromUri) {
   ASSERT_RAISES(Invalid, S3Options::FromUri("s3:///foo/bar/", &path));
 }
 
+TEST(S3Options, FromAccessKey) {
+  S3Options options;
+
+  // session token is optional and should default to empty string
+  options = S3Options::FromAccessKey("access", "secret");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "");
+
+  options = S3Options::FromAccessKey("access", "secret", "token");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "token");
+}
+
+TEST(S3Options, FromAssumeRole) {
+  S3Options options;
+
+  // the only required argument should be role arn
+  options = S3Options::FromAssumeRole("my_role_arn");

Review comment:
       This test seems to take a very long time on my local machine. Apparently it's trying to make connections to `169.254.169.254` in a loop. Is there a way to avoid that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671774899


   > I've submitted a PR for conda-forge at [conda-forge/aws-sdk-cpp-feedstock#124](https://github.com/conda-forge/aws-sdk-cpp-feedstock/pull/124)
   
   Thanks for that @pitrou.  Is there anything I need to do to get the CI systems to pick up the new conda-forge packages?  I see relevant releases here: https://anaconda.org/conda-forge/aws-sdk-cpp/files?version=1.7.164, but I recently pushed a change that is still failing conda builds.  It looks like the previous release is still be used for the respective platforms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7803:
URL: https://github.com/apache/arrow/pull/7803


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-674781958


   @corleyma Unfortunately, the "ursabot" builders use a different execution scheme. The packages need to be updated manually there, I beliave. I pinged @kszucs who should be able to handle it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671959521


   >  Is there anything I need to do to get the CI systems to pick up the new conda-forge packages?
   
   I don't think so, perhaps there is a CDN in-between that delays the package updates.
   
   Edit: no, that's because the conda packages are installed in a Docker image. We'll need to wait for the Docker image to be rebuilt...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r467822801



##########
File path: cpp/src/arrow/filesystem/s3fs.h
##########
@@ -62,10 +62,12 @@ struct ARROW_EXPORT S3Options {
   void ConfigureAnonymousCredentials();
 
   /// Configure with explicit access and secret key.
-  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key);
+  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key,
+                          const std::string& session_token = "");
 
   std::string GetAccessKey() const;
   std::string GetSecretKey() const;
+  std::string GetSessionToken() const;

Review comment:
       I added C++ tests to `s3fs_test.cc` to cover these changes.  Unfortunately, adding support for STS-like temporary credentials to Minio involves addition of another dependency, Keycloak.  I spent a bit of time reading relevant docs but could not seem to find a programmatic way to replicate the desired behavior (the Keycloak guides I found were very UI-driven).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-674806014


   @ursabot build


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671321491


   I've submitted a PR for conda-forge at https://github.com/conda-forge/aws-sdk-cpp-feedstock/pull/124


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r467843000



##########
File path: cpp/src/arrow/filesystem/s3fs.h
##########
@@ -62,10 +62,12 @@ struct ARROW_EXPORT S3Options {
   void ConfigureAnonymousCredentials();
 
   /// Configure with explicit access and secret key.
-  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key);
+  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key,
+                          const std::string& session_token = "");
 
   std::string GetAccessKey() const;
   std::string GetSecretKey() const;
+  std::string GetSessionToken() const;

Review comment:
       I think it's ok not to have any deeper test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-664613233


   > Is the cadence of one release every few months likely to continue?
   
   I'd expect so. Depending on your use case and stability requirements, you could try using our nightly packages. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-674891087


   Seems like we're fine now, will merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-674786830


   Indeed, the docker images there need to be rebuilt. I'm updating them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r468369686



##########
File path: cpp/src/arrow/filesystem/s3fs.h
##########
@@ -62,10 +75,18 @@ struct ARROW_EXPORT S3Options {
   void ConfigureAnonymousCredentials();
 
   /// Configure with explicit access and secret key.
-  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key);
+  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key,
+                          const std::string& session_token = "");
+
+  /// Configure with credentials from an assumed role.
+  void ConfigureAssumeRoleCredentials(
+      const std::string& role_arn, const std::string& session_name = "",
+      const std::string& external_id = "", int load_frequency = 900,
+      const std::shared_ptr<Aws::STS::STSClient>& stsClient = nullptr);

Review comment:
       I inferred this is something to do with compliance with C++/CLI; replaced with existing NULLPTR macro.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671774899


   > I've submitted a PR for conda-forge at [conda-forge/aws-sdk-cpp-feedstock#124](https://github.com/conda-forge/aws-sdk-cpp-feedstock/pull/124)
   
   Thanks for that @pitrou.  Is there anything I need to do to get the CI systems to pick up the new conda-forge packages?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671287472


   I addressed the comments in the initial round of review, and I also added deeper support for temporary credentials by enabling the use of `STSAssumeRoleCredentialsProvider` to auto-refresh temporary credentials.  However, this seems to break conda-based builds because of an inability to find/link additional dependencies from the aws c++ sdk (sts).  Unfortunately my knowledge of both conda and CMake is a bit limited, so I'm not sure how to go about addressing that.  I'm happy to dig a bit deeper if one of you can point the way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671963588


   @corleyma Can you apply this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r465781641



##########
File path: python/pyarrow/_s3fs.pyx
##########
@@ -81,9 +81,9 @@ cdef class S3FileSystem(FileSystem):
     cdef:
         CS3FileSystem* s3fs
 
-    def __init__(self, *, access_key=None, secret_key=None, anonymous=False,
-                 region=None, scheme=None, endpoint_override=None,
-                 bint background_writes=True):
+    def __init__(self, *, access_key=None, secret_key=None, session_token=None,

Review comment:
       You must add `session_token` in the docstring above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671302026


   Thank you @corleyma. I'll take a look at the conda-forge part.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-673070138


   @pitrou any idea what the cadence is for rebuilding those Docker images used by the CI system (or whether we can use our own base image)? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

Posted by GitBox <gi...@apache.org>.
corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r468195140



##########
File path: cpp/src/arrow/filesystem/s3fs.h
##########
@@ -62,10 +75,18 @@ struct ARROW_EXPORT S3Options {
   void ConfigureAnonymousCredentials();
 
   /// Configure with explicit access and secret key.
-  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key);
+  void ConfigureAccessKey(const std::string& access_key, const std::string& secret_key,
+                          const std::string& session_token = "");
+
+  /// Configure with credentials from an assumed role.
+  void ConfigureAssumeRoleCredentials(
+      const std::string& role_arn, const std::string& session_name = "",
+      const std::string& external_id = "", int load_frequency = 900,
+      const std::shared_ptr<Aws::STS::STSClient>& stsClient = nullptr);

Review comment:
       `cpplint` does not like this, but I am just mirroring the default value used by the SDK.  Should I suppress the violation or is there indeed a better practice?  I'm a C++ neophyte so pointers (ha!) appreciated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org