You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Peter Groesbeck <pe...@gmail.com> on 2020/05/22 20:39:34 UTC

Does Flink use EMRFS?

Hi,

I'm using Flink StreamingFileSink running in one AWS account (A) to another
(B). I'm also leveraging a SecurityConfiguration in the CFN to assume a
role in account B so that when I write there the files are owned by account
B which then in turn allows account B to delegate to other AWS accounts (C
and D). The reason these files must be owned by the other account is
because AWS doesn't support cross account delegation:
https://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example4.html

SecurityConfiguration:
  Type: AWS::EMR::SecurityConfiguration
  Properties:
    Name: String
    SecurityConfiguration:
      AuthorizationConfiguration:
        EmrFsConfiguration:
          RoleMappings:
            - Role: arn:aws:iam::<B-account>:role/EMR_EC2_DefaultRole
              IdentifierType: Prefix
              Identifiers:
                - s3://my-bucket/prefix/
            - Role: arn:aws:iam::<B-account>:role/EMR_DefaultRole
              IdentifierType: Prefix
              Identifiers:
                - s3://my-bucket/prefix/


I've referenced this in my Cluster block as well:

ReleaseLabel: !Ref ReleaseLabel
SecurityConfiguration: !Ref SecurityConfiguration
ScaleDownBehavior: TERMINATE_AT_TASK_COMPLETION

For some reason the files are still owned by account A. It seems like Flink
is using the old Hadoop FS implementation instead of EMRFS which should (I
believe) grant the proper ownership so that bucket permissions can apply to
the written objects and in turn delegate read permissinos to accounts C, D
ect.

Any help would be greatly appreciated.

Thanks,
Peter

Re: Does Flink use EMRFS?

Posted by Rafi Aroch <ra...@gmail.com>.
Hi Peter,

I've dealt with the cross-account delegation issues in the past (with no
relation to Flink) and got into the same ownership problems (accounts can't
access data, account A 'loses' access to it's own data).

My 2-cents are that:

   - The account that produces the data (A) should be the ONLY OWNER of
   that data.
   - The policy to access the data should be managed in ONE place only, the
   producing account (A).
   - If you wish to expose access to your data to other accounts (B, C, D),
   the best approach would be to:
      - In account A - Create a policy that defines the access you wish to
      expose. For example: read access to specific bucket & path:

{
>   "Version": "2012-10-17",
>   "Statement": [
>     {
>       "Effect": "Allow",
>       "Action": [
>         "s3:GetObject",
>         "s3:ListBucket"
>       ],
>       "Resource": [
>         "arn:aws:s3:::bucket-name",
>         "arn:aws:s3:::bucket-name/*"
>       ]
>     }
>   ]
> }
>
>
   - In account A - Create a role and define which accounts you allow to
      AssumeRole (this let's you control if ALL or specific users of the other
      account should access the data):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::account-B:root",
          "arn:aws:iam::account-C:root",
          "arn:aws:iam::account-D:root"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}


   - In account A - attach the policy to the role.
      - In other accounts - THEY control which users have access to the
      data by allowing AssumeRole permissions to the role above from account A.
      This could be unrestricted (by *) or restricted to a specific role.:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::<account-A-id>:role/external-access-role"
        }
    ]
}


Now when a user AssumeRole to that external-access-role role, it will be
granted the specified access without playing around with ownership
configurations.

Hope this helps,
Rafi


On Fri, May 22, 2020 at 11:39 PM Peter Groesbeck <pe...@gmail.com>
wrote:

> Hi,
>
> I'm using Flink StreamingFileSink running in one AWS account (A) to
> another (B). I'm also leveraging a SecurityConfiguration in the CFN to
> assume a role in account B so that when I write there the files are owned
> by account B which then in turn allows account B to delegate to other AWS
> accounts (C and D). The reason these files must be owned by the other
> account is because AWS doesn't support cross account delegation:
>
> https://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example4.html
>
> SecurityConfiguration:
>   Type: AWS::EMR::SecurityConfiguration
>   Properties:
>     Name: String
>     SecurityConfiguration:
>       AuthorizationConfiguration:
>         EmrFsConfiguration:
>           RoleMappings:
>             - Role: arn:aws:iam::<B-account>:role/EMR_EC2_DefaultRole
>               IdentifierType: Prefix
>               Identifiers:
>                 - s3://my-bucket/prefix/
>             - Role: arn:aws:iam::<B-account>:role/EMR_DefaultRole
>               IdentifierType: Prefix
>               Identifiers:
>                 - s3://my-bucket/prefix/
>
>
> I've referenced this in my Cluster block as well:
>
> ReleaseLabel: !Ref ReleaseLabel
> SecurityConfiguration: !Ref SecurityConfiguration
> ScaleDownBehavior: TERMINATE_AT_TASK_COMPLETION
>
> For some reason the files are still owned by account A. It seems like
> Flink is using the old Hadoop FS implementation instead of EMRFS which
> should (I believe) grant the proper ownership so that bucket permissions
> can apply to the written objects and in turn delegate read permissinos to
> accounts C, D ect.
>
> Any help would be greatly appreciated.
>
> Thanks,
> Peter
>