You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Jacob Marble <ja...@gmail.com> on 2018/03/13 16:44:22 UTC

Dealing with AWS Regions

Starting a new thread just for dealing with AWS regions better, context S3
and Redshift.

S3FileSystem.amazonS3 build could be refactored to select region based on
[1]:
1. the flag value region
2. the EC2 region, if found in environment (running in EC2 VM)
3. the default region (us-east-1)

For actually moving data, a Map<String, AmazonS3> could be used to hold an
S3 client per region, new S3 clients created as needed. The "master" client
can be used to find a bucket's region [2].

I think this is pragmatic, looking for feedback before I write a PR. Also,
if someone is already making progress, let me know.

Jacob

[1]
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-region-selection.html

[2]
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#getBucketLocation-java.lang.String-

Re: Dealing with AWS Regions

Posted by Lukasz Cwik <lc...@google.com>.

For now I think we should stick with registering different configurations
of filesystems under different schemes so we should use
s3a://, s3b://, and s3c://.

If you go down the route of enhancing S3Options (similar to
HadoopFileSystemOptions) to be able to register multiple S3 filesystems
under different schemes, you would get the fact that all runners would
support transporting this configuration to the workers for free via
PipelineOptions.

Going with the route of saving the set of registered FileSystems to be
available on workers is more work but in my opinion much more flexible and
would allow for FileSystems to be decoupled from PipelineOptions and hence
could be constructed and registered directly.

What were you thinking?

On Tue, Mar 13, 2018 at 9:44 AM Jacob Marble <ja...@gmail.com> wrote:

> Starting a new thread just for dealing with AWS regions better, context S3
> and Redshift.
>
> S3FileSystem.amazonS3 build could be refactored to select region based on
> [1]:
> 1. the flag value region
> 2. the EC2 region, if found in environment (running in EC2 VM)
> 3. the default region (us-east-1)
>
> For actually moving data, a Map<String, AmazonS3> could be used to hold an
> S3 client per region, new S3 clients created as needed. The "master" client
> can be used to find a bucket's region [2].
>
> I think this is pragmatic, looking for feedback before I write a PR. Also,
> if someone is already making progress, let me know.
>
> Jacob
>
> [1]
>
> https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-region-selection.html
>
> [2]
> https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#getBucketLocation-java.lang.String-
>