You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/12/22 17:43:29 UTC

[I] GCS Signed URL Support [arrow-rs]

tustvold opened a new issue, #5233:
URL: https://github.com/apache/arrow-rs/issues/5233

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   -->
   
   https://github.com/apache/arrow-rs/pull/4876 added support for generating S3 URLs, via the [Signer](https://docs.rs/object_store/latest/object_store/signer/trait.Signer.html) trait. This ticket tracks implementing Signed for [GoogleCloudStorage](https://docs.rs/object_store/latest/object_store/gcp/struct.GoogleCloudStorage.html)
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   The process for generating signed URLs is described [here](https://cloud.google.com/storage/docs/access-control/signed-urls).
   
   Once the stringToSign has been constructed there are [two mechanisms](https://cloud.google.com/storage/docs/access-control/signing-urls-manually) for generating the signature:
   
   * Directly sign the URL using the RSA key pair of a service account
   * Make an authorized API call to the [signBlob](https://cloud.google.com/storage/docs/authentication/creating-signatures) API
   
   The latter approach will support all [GcpCredentialProvider](https://docs.rs/object_store/latest/object_store/gcp/type.GcpCredentialProvider.html) and is therefore probably the approach to start with. The former is a touch more fiddly, and will likely involve some rejigging of GoogleCloudStorageBuilder::build to expose this ServiceAccountKey in such a way that it can be used by the Signer implementation. It would be perfectly acceptable for the first version to only support the signBlob approach.
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   
   Split out of https://github.com/apache/arrow-rs/issues/3027
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1897834567

   > 1. The GCSCredential seems too simple. It uses a self-signed JWT to request the temporary token and then retrieves the objects. I think it should use the service account credential to retrieve the objects, like how it is done in the AWS part.
   
   Yep. GCP has different kind of services accounts like `service_account`, `external_account` and `impersonated_service_account`. We should store those services account information to presign users request.
   
   reqsign implemented most of them which worth to take a look: https://github.com/Xuanwo/reqsign/blob/main/src/google/credential.rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "l1nxy (via GitHub)" <gi...@apache.org>.
l1nxy commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1900441314

   Thank you for your assistance, and I will attempt to implement it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1898170040

   I think it is important to separate the credential from how it is obtained, as we support a large number of auth options beyond service account credentials.  For request authorization all that is needed is a JWT so that is what GcpCredential contains, and there are then various ways of obtaining such a JWT
   
   Now the problem comes that signing requires additional information, this will need to be plumbed through from the builder, sourced either from the underlying credential provider before it is type erased, or as an explicit config option passed to the builder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "tdikland (via GitHub)" <gi...@apache.org>.
tdikland commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-2025251381

   Any progress on this issue? I would be willing to step in and help out where possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "l1nxy (via GitHub)" <gi...@apache.org>.
l1nxy commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1895907335

   @tustvold Hi, I'm confused about the code. Can I ask some questions?
   1. The GCSCredential seems too simple. It uses a self-signed JWT to request the temporary token and then retrieves the objects. I think it should use the service account credential to retrieve the objects, like how it is done in the AWS part.
   2. Now, I'm stuck on the `signBlob` request because it requires the service account email in order to create a signature, but this information is not included in the GCS credential struct.
   3. My question is what should I do next? If we can solve the credential problem, both methods - `signblob` and using an RSA key to generate a URL - will be complete.
   
   additional information:
   > SERVICE_ACCOUNT_EMAIL is the email address of the service account you want to use to create the signature. For example, service-7550275089395@my-pet-project.iam.gserviceaccount.com.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "l1nxy (via GitHub)" <gi...@apache.org>.
l1nxy commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1886556686

   I'd like to give this a try and write a first draft. Maybe I can write more later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "l1nxy (via GitHub)" <gi...@apache.org>.
l1nxy commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1886800048

   @tustvold Could you please assign this issue to me? It will serve as a reminder for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #5233:
URL: https://github.com/apache/arrow-rs/issues/5233#issuecomment-1900402271

   Done some more digging, in no particular order
   
   **InstanceMetadata**
   
   So the instance metadata token endpoint only provides an access token, and no email, although this can be retrieved with a retrieved with
   
   ```
   curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/email
   ```
   
   **AuthorizedUserCredentials**
   
   As described [here](https://google.aip.dev/auth/4113) the configuration metadata does not contain information on the calling identity. Additional as described [here](https://cloud.google.com/docs/authentication/token-types#access) the returned access token is opaque and does not provide information on the identity either.
   
   However, the information can be retrieved by calling a specific API endpoint [here](https://cloud.google.com/docs/authentication/token-types#access-contents).
   
   ```
   curl "https://oauth2.googleapis.com/tokeninfo?access_token=ACCESS_TOKEN"
   ```
   
   **HMAC Keys**
   
   I am not sure this is something we will want to support, but you can also sign requests using [HMAC keys](https://cloud.google.com/storage/docs/aws-simple-migration#authentication).
   
   **Conclusion**
   
   I think we will need an approach that allows different signing methodologies based on the underlying credential provider. The simplest way to do this is likely to implement `Signer` for the various different credentials providers, i.e. `SelfSignedJwt`, `InstanceCredentialProvider`, etc... to store an `Arc<dyn Signer>` on `GoogleCloudStorage` and implement `Signer` for by calling through to it.
   
   Let me know if anything isn't clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCS Signed URL Support [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #5233: GCS Signed URL Support
URL: https://github.com/apache/arrow-rs/issues/5233


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org