You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mr-brobot (via GitHub)" <gi...@apache.org> on 2023/04/30 23:32:11 UTC

[GitHub] [arrow-rs] mr-brobot opened a new pull request, #4161: feat(aws_profile): use profile region as fallback

mr-brobot opened a new pull request, #4161:
URL: https://github.com/apache/arrow-rs/pull/4161

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #4158.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   Rationale in #4158.
   
   # What changes are included in this PR?
   
   * Moved `ProfileProvider` to `aws::profile` module
   * Added `aws::region::RegionProvider`
   * Lazy-init profile credential provider on credentials cache miss
   * Support overriding profile region
   * Tests for defaulting to profile region and for overriding via `AmazonS3Builder::with_region`.
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   When using a named profile, users no longer need to explicitly set a region when using `AmazonS3Builder`.
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on a diff in pull request #4161: object_store: Support region configured via AWS named profile

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1181758270


##########
object_store/src/aws/profile.rs:
##########
@@ -0,0 +1,99 @@
+#![cfg(feature = "aws_profile")]
+
+use aws_config::profile::ProfileFileCredentialsProvider;
+use aws_config::provider_config::ProviderConfig;
+use aws_credential_types::provider::ProvideCredentials;
+use aws_types::region::Region;
+use futures::future::BoxFuture;
+use std::sync::Arc;
+use std::time::Instant;
+use std::time::SystemTime;
+
+use super::credential::CredentialProvider;
+use crate::aws::AwsCredential;
+use crate::client::token::{TemporaryToken, TokenCache};
+use crate::Result;
+
+#[derive(Debug)]
+pub struct ProfileProvider {
+    name: String,
+    region: Option<String>,
+    cache: TokenCache<Arc<AwsCredential>>,
+}
+
+impl ProfileProvider {
+    pub fn new(name: String, region: Option<String>) -> Self {
+        Self {
+            name,
+            region,
+            cache: Default::default(),
+        }
+    }
+}
+
+impl CredentialProvider for ProfileProvider {
+    fn get_credential(&self) -> BoxFuture<'_, Result<Arc<AwsCredential>>> {
+        Box::pin(self.cache.get_or_insert_with(move || async move {
+            let region = match self.region.clone() {
+                Some(r) => Some(Region::new(r)),
+                None => None,
+            };
+
+            let config = ProviderConfig::default().with_region(region);
+
+            let credentials = ProfileFileCredentialsProvider::builder()
+                .configure(&config)
+                .profile_name(&self.name)
+                .build();
+
+            let c = credentials.provide_credentials().await.map_err(|source| {
+                crate::Error::Generic {
+                    store: "S3",
+                    source: Box::new(source),
+                }
+            })?;
+            let t_now = SystemTime::now();
+            let expiry = c
+                .expiry()
+                .and_then(|e| e.duration_since(t_now).ok())
+                .map(|ttl| Instant::now() + ttl);
+
+            Ok(TemporaryToken {
+                token: Arc::new(AwsCredential {
+                    key_id: c.access_key_id().to_string(),
+                    secret_key: c.secret_access_key().to_string(),
+                    token: c.session_token().map(ToString::to_string),
+                }),
+                expiry,
+            })
+        }))
+    }
+}
+
+#[cfg(not(test))]
+mod region {
+    use super::*;
+    use crate::aws::region::RegionProvider;
+    use aws_config::meta::region::ProvideRegion;
+    use aws_config::profile::region::Builder as ProfileRegionBuilder;
+
+    impl RegionProvider for ProfileProvider {
+        #[tokio::main(flavor = "current_thread")]

Review Comment:
   I'd like to use the `BoxFuture` approach, but would that work when calling via a synchronous function? This method is only called from `AmazonS3Builder::build` when determining the region to use in the resulting `AmazonS3` object.
   
   Maybe the right thing to do is make `get_region` async, and let the caller decide how to await/block depending on context/call pattern?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4161: object_store: Support region configured via AWS named profile

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1181472471


##########
object_store/src/aws/profile.rs:
##########
@@ -0,0 +1,99 @@
+#![cfg(feature = "aws_profile")]
+
+use aws_config::profile::ProfileFileCredentialsProvider;
+use aws_config::provider_config::ProviderConfig;
+use aws_credential_types::provider::ProvideCredentials;
+use aws_types::region::Region;
+use futures::future::BoxFuture;
+use std::sync::Arc;
+use std::time::Instant;
+use std::time::SystemTime;
+
+use super::credential::CredentialProvider;
+use crate::aws::AwsCredential;
+use crate::client::token::{TemporaryToken, TokenCache};
+use crate::Result;
+
+#[derive(Debug)]
+pub struct ProfileProvider {
+    name: String,
+    region: Option<String>,
+    cache: TokenCache<Arc<AwsCredential>>,
+}
+
+impl ProfileProvider {
+    pub fn new(name: String, region: Option<String>) -> Self {
+        Self {
+            name,
+            region,
+            cache: Default::default(),
+        }
+    }
+}
+
+impl CredentialProvider for ProfileProvider {
+    fn get_credential(&self) -> BoxFuture<'_, Result<Arc<AwsCredential>>> {
+        Box::pin(self.cache.get_or_insert_with(move || async move {
+            let region = match self.region.clone() {
+                Some(r) => Some(Region::new(r)),
+                None => None,
+            };
+
+            let config = ProviderConfig::default().with_region(region);
+
+            let credentials = ProfileFileCredentialsProvider::builder()
+                .configure(&config)
+                .profile_name(&self.name)
+                .build();
+
+            let c = credentials.provide_credentials().await.map_err(|source| {
+                crate::Error::Generic {
+                    store: "S3",
+                    source: Box::new(source),
+                }
+            })?;
+            let t_now = SystemTime::now();
+            let expiry = c
+                .expiry()
+                .and_then(|e| e.duration_since(t_now).ok())
+                .map(|ttl| Instant::now() + ttl);
+
+            Ok(TemporaryToken {
+                token: Arc::new(AwsCredential {
+                    key_id: c.access_key_id().to_string(),
+                    secret_key: c.secret_access_key().to_string(),
+                    token: c.session_token().map(ToString::to_string),
+                }),
+                expiry,
+            })
+        }))
+    }
+}
+
+#[cfg(not(test))]
+mod region {
+    use super::*;
+    use crate::aws::region::RegionProvider;
+    use aws_config::meta::region::ProvideRegion;
+    use aws_config::profile::region::Builder as ProfileRegionBuilder;
+
+    impl RegionProvider for ProfileProvider {
+        #[tokio::main(flavor = "current_thread")]

Review Comment:
   This will spawn a tokio runtime per request, perhaps we could use the BoxFuture approach used by CredentialProvider?



##########
object_store/src/aws/region.rs:
##########
@@ -0,0 +1,3 @@
+pub trait RegionProvider {

Review Comment:
   This new file will need a license header, this is what RAT is complaining about.
   
   Tbh given how small it is, it might not even warrant its own module



##########
object_store/src/aws/mod.rs:
##########
@@ -1563,3 +1591,64 @@ mod tests {
         }
     }
 }
+
+#[cfg(all(test, feature = "aws_profile"))]
+mod profile_tests {
+    use super::*;
+    use profile::ProfileProvider;
+    use region::RegionProvider;
+    use std::env;
+
+    impl RegionProvider for ProfileProvider {
+        fn get_region(&self) -> Option<String> {

Review Comment:
   I can't help feeling this formulation doesn't really test the region inference logic? Is there some way to create a fake profile for testing purposes?
   
   Perhaps https://docs.rs/aws-config/latest/aws_config/profile/profile_file/struct.ProfileFiles.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on a diff in pull request #4161: object_store: Support region configured via AWS named profile

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1182869893


##########
object_store/src/aws/mod.rs:
##########
@@ -1563,3 +1591,64 @@ mod tests {
         }
     }
 }
+
+#[cfg(all(test, feature = "aws_profile"))]
+mod profile_tests {
+    use super::*;
+    use profile::ProfileProvider;
+    use region::RegionProvider;
+    use std::env;
+
+    impl RegionProvider for ProfileProvider {
+        fn get_region(&self) -> Option<String> {

Review Comment:
   I added a fake profile for testing in `ProfileProvider::get_region_provider`. Let me know if you want to see more tests or think this should be organized differently. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on a diff in pull request #4161: object_store: Support region configured via AWS named profile

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1181723694


##########
object_store/src/aws/mod.rs:
##########
@@ -1563,3 +1591,64 @@ mod tests {
         }
     }
 }
+
+#[cfg(all(test, feature = "aws_profile"))]
+mod profile_tests {
+    use super::*;
+    use profile::ProfileProvider;
+    use region::RegionProvider;
+    use std::env;
+
+    impl RegionProvider for ProfileProvider {
+        fn get_region(&self) -> Option<String> {

Review Comment:
   Agreed, this only tests the logic in `AmazonS3Builder` that falls back on the profile region when a profile is provided.
   
   I'll add a test in the `aws::profile` module that uses [ProfileFiles](https://docs.rs/aws-config/latest/aws_config/profile/profile_file/struct.ProfileFiles.html) to verify the configured profile region is actually what comes out of `ProfileProvider::get_region`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4161: Object Store (AWS): Support region configured via named profile

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1189761207


##########
object_store/src/aws/mod.rs:
##########
@@ -1094,12 +1103,36 @@ impl AmazonS3Builder {
     }
 }
 
+#[cfg(feature = "aws_profile")]
+fn profile_region(profile: String) -> Option<String> {
+    use std::{panic, thread};
+    use tokio::runtime::Handle;
+
+    let handle = Handle::current();
+    let provider = profile::ProfileProvider::new(profile, None);
+
+    let result = thread::spawn(move || handle.block_on(provider.get_region()));
+
+    match result.join() {
+        Ok(region) => region,
+        Err(e) => panic::resume_unwind(e),
+    }

Review Comment:
   ```suggestion
       handle.block_on(provider.get_region())
   ```
   
   Both are equally "problematic" in that they pin a tokio worker, but this at least avoids spawning an additional thread



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4161: Object Store (AWS): Support region configured via named profile

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#issuecomment-1543906408

   Is this still necessary following https://github.com/apache/arrow-rs/pull/4188 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on a diff in pull request #4161: object_store: Support region configured via AWS named profile

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1182798910


##########
object_store/src/aws/mod.rs:
##########
@@ -1094,12 +1103,30 @@ impl AmazonS3Builder {
     }
 }
 
+#[cfg(feature = "aws_profile")]
+fn profile_region(profile: String) -> Option<String> {
+    use crate::aws::credential::RegionProvider;
+    use futures::executor::block_on;
+
+    let provider = profile::ProfileProvider::new(profile, None);
+
+    block_on(provider.get_region())

Review Comment:
   Good call. Unfortunately, using tokio, calling `block_on` on the current runtime results in the following error:
   
   `ERROR: Cannot start a runtime from within a runtime. This happens because a function (like block_on) attempted to block the current thread while the thread is being used to drive asynchronous tasks.`
   
   According to the [`tokio::Handle::block_on` docs](https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.html#method.block_on), we need to spawn a new thread and block there.
   
   ```rust
   use std::{panic, thread};
   use tokio::runtime::Handle;
   
   let handle = Handle::current();
   let provider = profile::ProfileProvider::new(profile, None);
   
   let result = thread::spawn(move || handle.block_on(provider.get_region()));
   
   match result.join() {
       Ok(region) => region,
       Err(e) => panic::resume_unwind(e),
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on pull request #4161: Object Store (AWS): Support region configured via named profile

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#issuecomment-1544295114

   This is definitely far less helpful following #4188.
   
   Personally, it doesn't seem right that Object Store supports AWS profiles at all... It seems like that support should happen "further up the stack" (e.g., in DataFusion CLI). User-facing apps built on top of Object Store will know their users best and can decide how to build the necessary integrations to support named profiles, SSO, etc.
   
   I think #4163 moves in the right direction and I'd be happy to help make DataFusion CLI the reference implementation for whatever approach the community decides on.
   
   But wherever named profile support is implemented, I think it should be complete (i.e., supporting both config and credentials). It's an unusual experience for an AWS user to find that an app supports named profiles, only to receive an error that the region set in their profile config was not used. 🥲 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4161: Object Store (AWS): Support region configured via named profile

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on pull request #4161: object_store: Support region configured via AWS named profile

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#issuecomment-1537167590

   @tustvold This is ready for your review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4161: object_store: Support region configured via AWS named profile

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4161:
URL: https://github.com/apache/arrow-rs/pull/4161#discussion_r1182471516


##########
object_store/src/aws/mod.rs:
##########
@@ -1094,12 +1103,30 @@ impl AmazonS3Builder {
     }
 }
 
+#[cfg(feature = "aws_profile")]
+fn profile_region(profile: String) -> Option<String> {
+    use crate::aws::credential::RegionProvider;
+    use futures::executor::block_on;
+
+    let provider = profile::ProfileProvider::new(profile, None);
+
+    block_on(provider.get_region())

Review Comment:
   ```suggestion
       Handle::current().block_on(provider.get_region())
   ```
   
   Using [`tokio::Handle`](https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.html#method.block_on) might be safer, as I believe the upstream assumes tokio



##########
object_store/src/aws/credential.rs:
##########
@@ -291,6 +292,12 @@ fn canonicalize_headers(header_map: &HeaderMap) -> (String, String) {
     (signed_headers, canonical_headers)
 }
 
+/// Provides a region name where cloud resources are located
+#[async_trait]
+pub trait RegionProvider {

Review Comment:
   AFAICT get_region is only called on a concrete ProfileProvider, and so I don't see a region for this trait over just a regular member function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org