You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Siegmann <ds...@securityscorecard.io> on 2016/09/27 14:53:36 UTC

Access S3 buckets in multiple accounts

I am running Spark on Amazon EMR and writing data to an S3 bucket. However,
the data is read from an S3 bucket in a separate AWS account. Setting the
fs.s3a.access.key and fs.s3a.secret.key values is sufficient to get access
to the other account (using the s3a protocol), however I then won't have
access to the S3 bucket in the EMR cluster's AWS account.

Is there any way for Spark to access S3 buckets in multiple accounts? If
not, is there any best practice for how to work around this?

--
Daniel Siegmann
Senior Software Engineer
*SecurityScorecard Inc.*
214 W 29th Street, 5th Floor
New York, NY 10001

Re: Access S3 buckets in multiple accounts

Posted by Daniel Siegmann <ds...@securityscorecard.io>.
Thanks for the help everyone. I was able to get permissions configured for
my cluster so it now has access to the bucket on the other account.


--
Daniel Siegmann
Senior Software Engineer
*SecurityScorecard Inc.*
214 W 29th Street, 5th Floor
New York, NY 10001


On Wed, Sep 28, 2016 at 10:03 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> On 27 Sep 2016, at 15:53, Daniel Siegmann <ds...@securityscorecard.io>
> wrote:
>
> I am running Spark on Amazon EMR and writing data to an S3 bucket.
> However, the data is read from an S3 bucket in a separate AWS account.
> Setting the fs.s3a.access.key and fs.s3a.secret.key values is sufficient to
> get access to the other account (using the s3a protocol), however I then
> won't have access to the S3 bucket in the EMR cluster's AWS account.
>
> Is there any way for Spark to access S3 buckets in multiple accounts? If
> not, is there any best practice for how to work around this?
>
>
>
> There are 2 ways to do this without changing permissions
>
> 1. different implementations: use s3a for one, s3n for the other, give
> them the different secrets
>
> 2. insecure: use the secrets in the URI. s3a://AWSID:escaped-secret@
> bucket/path
> -leaks your secrets thoughout the logs, has problems with "/" in the
> password..if there is one, you'll probably need to regenerate the password.
>
> This is going to have to be fixed in the s3a implementation at some point,
> as it's not only needed for cross user auth, once you switch to v4 AWS auth
> you need to specify the appropriate s3 endpoint for your region; you can't
> just use s3 central, but need to choose s3 frankfurt, s3 seoul, etc: so
> won't be able to work with data across regions.
>

Re: Access S3 buckets in multiple accounts

Posted by Steve Loughran <st...@hortonworks.com>.
On 27 Sep 2016, at 15:53, Daniel Siegmann <ds...@securityscorecard.io>> wrote:

I am running Spark on Amazon EMR and writing data to an S3 bucket. However, the data is read from an S3 bucket in a separate AWS account. Setting the fs.s3a.access.key and fs.s3a.secret.key values is sufficient to get access to the other account (using the s3a protocol), however I then won't have access to the S3 bucket in the EMR cluster's AWS account.

Is there any way for Spark to access S3 buckets in multiple accounts? If not, is there any best practice for how to work around this?



There are 2 ways to do this without changing permissions

1. different implementations: use s3a for one, s3n for the other, give them the different secrets

2. insecure: use the secrets in the URI. s3a://AWSID:escaped-secret@bucket/path
-leaks your secrets thoughout the logs, has problems with "/" in the password..if there is one, you'll probably need to regenerate the password.

This is going to have to be fixed in the s3a implementation at some point, as it's not only needed for cross user auth, once you switch to v4 AWS auth you need to specify the appropriate s3 endpoint for your region; you can't just use s3 central, but need to choose s3 frankfurt, s3 seoul, etc: so won't be able to work with data across regions.

Re: Access S3 buckets in multiple accounts

Posted by Eike von Seggern <ei...@sevenval.com>.
Hi Teng,

2016-09-28 10:42 GMT+02:00 Teng Qiu <te...@gmail.com>:

> hmm, i do not believe security group can control s3 bucket access... is
> this something new? or you mean IAM role?
>

You're right, it's not security groups but you can configure a VPC endpoint
for the EMR-Cluster and grant access rights for this VPCe in the foreign S3
bucket like:

[{
  "Sid": "Allow bucket list access from vpc endpoint",
  "Effect": "Allow",
  "Principal": "*",
  "Action": "s3:ListBucket",
  "Resource": "arn:aws:s3:::YourBucketName",
  "Condition": {
  "StringEquals": {
  "aws:sourceVpce": "vpce-YourId"
  }
  }
},
{
  "Sid": "Allow bucket object read access from vpc endpoint",
  "Effect": "Allow",
  "Principal": "*",
  "Action": "s3:GetObject","
  Resource": "arn:aws:s3:::YourBucketName/*",
  "Condition": {
  "StringEquals": {
  "aws:sourceVpce": "vpce-YourId"
  }
  }
}]

Best

Eike

Re: Access S3 buckets in multiple accounts

Posted by Teng Qiu <te...@gmail.com>.
hmm, i do not believe security group can control s3 bucket access... is
this something new? or you mean IAM role?

@Daniel, using spark on EMR, you should be able to use IAM role to access
AWS resources, you do not need to specify fs.s3a.access.key or
fs.s3a.secret.key at all. S3A is able to use IAM role for the EC2 instances
of EMR cluster.

then, for accessing "S3 buckets in multiple accounts", you need following
two steps:

1) define your policies of IAM role with Get/Put permissions for all of
your s3 bucket's ARN uri, such as something like this:
https://github.com/zalando-incubator/ro2key/blob/master/policy_bucket_readonly.json

2) you need to add this IAM role's ARN with Get/Put permissions in all the
"s3 bucket policy" in your other accounts.
refer to "Granting cross-account bucket access to a specific IAM role" from
https://blogs.aws.amazon.com/security/post/TxK5WUJK3DG9G8/How-to-Restrict-Amazon-S3-Bucket-Access-to-a-Specific-IAM-Role

Then your cross account s3 access should work.

and nice to read this part: When to use IAM policies vs. S3 policies
from
https://blogs.aws.amazon.com/security/post/TxPOJBY6FE360K/IAM-policies-and-Bucket-Policies-and-ACLs-Oh-My-Controlling-Access-to-S3-Resourc


2016-09-28 10:33 GMT+02:00 Eike von Seggern <ei...@sevenval.com>:

> Hi Daniel,
>
> you can start your EMR Cluster in a dedicated security group and configure
> the foreign bucket's policy to allow read-write access from that SG.
>
> Best
>
> Eike
>
> 2016-09-27 16:53 GMT+02:00 Daniel Siegmann <dsiegmann@securityscorecard.io
> >:
>
>> I am running Spark on Amazon EMR and writing data to an S3 bucket.
>> However, the data is read from an S3 bucket in a separate AWS account.
>> Setting the fs.s3a.access.key and fs.s3a.secret.key values is sufficient to
>> get access to the other account (using the s3a protocol), however I then
>> won't have access to the S3 bucket in the EMR cluster's AWS account.
>>
>> Is there any way for Spark to access S3 buckets in multiple accounts? If
>> not, is there any best practice for how to work around this?
>>
>> --
>> Daniel Siegmann
>> Senior Software Engineer
>> *SecurityScorecard Inc.*
>> 214 W 29th Street, 5th Floor
>> New York, NY 10001
>>
>>
>
>
> --
> ------------------------------------------------
> *Jan Eike von Seggern*
> Data Scientist
> ------------------------------------------------
> *Sevenval Technologies GmbH *
>
> FRONT-END-EXPERTS SINCE 1999
>
> Köpenicker Straße 154 | 10997 Berlin
>
> office   +49 30 707 190 - 229
> mail     eike.seggern@sevenval.com
>
> www.sevenval.com
>
> Sitz: Köln, HRB 79823
> Geschäftsführung: Jan Webering (CEO), Thorsten May, Sascha Langfus,
> Joern-Carlos Kuntze
>
> *Wir erhöhen den Return On Investment bei Ihren Mobile und Web-Projekten.
> Sprechen Sie uns an:*http://roi.sevenval.com/
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------
> FOLLOW US on
>
> [image: Sevenval blog]
> <http://sevenval.us11.list-manage1.com/track/click?u=5f2d34577b3182d6f029ebe63&id=ff955ef848&e=b789cc1a5f>
>
> [image: sevenval on twitter]
> <http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=998e8f655c&e=b789cc1a5f>
>  [image: sevenval on linkedin]
> <http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=7ae7d93d42&e=b789cc1a5f>[image:
> sevenval on pinterest]
> <http://sevenval.us11.list-manage2.com/track/click?u=5f2d34577b3182d6f029ebe63&id=f8c66fb950&e=b789cc1a5f>
>

Re: Access S3 buckets in multiple accounts

Posted by Eike von Seggern <ei...@sevenval.com>.
Hi Daniel,

you can start your EMR Cluster in a dedicated security group and configure
the foreign bucket's policy to allow read-write access from that SG.

Best

Eike

2016-09-27 16:53 GMT+02:00 Daniel Siegmann <ds...@securityscorecard.io>:

> I am running Spark on Amazon EMR and writing data to an S3 bucket.
> However, the data is read from an S3 bucket in a separate AWS account.
> Setting the fs.s3a.access.key and fs.s3a.secret.key values is sufficient to
> get access to the other account (using the s3a protocol), however I then
> won't have access to the S3 bucket in the EMR cluster's AWS account.
>
> Is there any way for Spark to access S3 buckets in multiple accounts? If
> not, is there any best practice for how to work around this?
>
> --
> Daniel Siegmann
> Senior Software Engineer
> *SecurityScorecard Inc.*
> 214 W 29th Street, 5th Floor
> New York, NY 10001
>
>


-- 
------------------------------------------------
*Jan Eike von Seggern*
Data Scientist
------------------------------------------------
*Sevenval Technologies GmbH *

FRONT-END-EXPERTS SINCE 1999

Köpenicker Straße 154 | 10997 Berlin

office   +49 30 707 190 - 229
mail     eike.seggern@sevenval.com

www.sevenval.com

Sitz: Köln, HRB 79823
Geschäftsführung: Jan Webering (CEO), Thorsten May, Sascha Langfus,
Joern-Carlos Kuntze

*Wir erhöhen den Return On Investment bei Ihren Mobile und Web-Projekten.
Sprechen Sie uns an:*http://roi.sevenval.com/
-----------------------------------------------------------------------------------------------------------------------------------------------
FOLLOW US on

[image: Sevenval blog]
<http://sevenval.us11.list-manage1.com/track/click?u=5f2d34577b3182d6f029ebe63&id=ff955ef848&e=b789cc1a5f>

[image: sevenval on twitter]
<http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=998e8f655c&e=b789cc1a5f>
 [image: sevenval on linkedin]
<http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=7ae7d93d42&e=b789cc1a5f>[image:
sevenval on pinterest]
<http://sevenval.us11.list-manage2.com/track/click?u=5f2d34577b3182d6f029ebe63&id=f8c66fb950&e=b789cc1a5f>