You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vamshi G <vg...@salesforce.com> on 2021/08/10 00:36:24 UTC

s3 access denied with flink-s3-fs-presto

We are using Flink version 1.13.0 on Kubernetes.
For checkpointing we have configured fs.s3 flink-s3-fs-presto.
We have enabled sse on our buckets with kms cmk.

flink-conf.yaml is configured as below.
s3.entropy.key: _entropy_
s3.entropy.length: 4
s3.path.style.access: true
s3.ssl.enabled: true
s3.sse.enabled: true
s3.sse.type: KMS
s3.sse.kms-key-id: <ARN of keyid>
s3.iam-role: <IAM role with read/write access to bucket>
s3.endpoint: <bucketname>.s3-us-west-2.amazonaws.com
s3.credentials-provider:
com.amazonaws.auth.profile.ProfileCredentialsProvider

However, PUT operations on the bucket are resulting in access denied error.
Access policies for the role are checked and works fine when checked with
CLI.
Also, can't get to see debug logs from presto s3 lib, is there a way to
enable logger for presto airlift logging?

Any inputs on above issue?

Re: s3 access denied with flink-s3-fs-presto

Posted by Parag Somani <so...@gmail.com>.
Hello,

I have successfully been able to store data on S3 bucket. Earlier, I used
to have a similar issue. What you need to confirm:
1. S3 bucket is created with RW access(irrespective if it is minio or AWS
S3)
2. "flink/opt/flink-s3-fs-presto-1.14.0.jar" jar is copied to plugin
directory of "flink/plugins/s3-fs-presto"
3. Add following configuration in config(configuration or programmatically,
either way)

state.checkpoints.dir: <S3://bucket-name/checkpoints>
    state.backend.fs.checkpointdir: <s3://bucket-name/checkpoints/>
    s3.path-style: true
    s3.path.style.access: true

On Wed, Oct 27, 2021 at 2:47 AM Vamshi G <vg...@salesforce.com> wrote:

> s3a with hadoop s3 filesystem works fine for us wit sts assume role
> credentials and with kms.
> Below are how our hadoop s3a configs look like. Since the endpoint is
> globally whitelisted, we don't explicitly mention the endpoint.
>
> fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
> fs.s3a.assumed.role.credentials.provider: com.amazonaws.auth.profile.ProfileCredentialsProvider
> fs.s3a.assumed.role.arn: arn:aws:iam::<account>:role/<iam_role>
> fs.s3a.server-side-encryption-algorithm: SSE-KMS
> fs.s3a.server-side-encryption.key: arn:aws:kms:<region>:<account>:key/<key-alias>
>
>
> However, for checkpointing we definitely want to use presto s3, and just
> could not make it work. FINE logging on presto-hive is not helping either,
> as the lib uses airlift logger.
> Also, based on the code here
> https://github.com/prestodb/presto/blob/2aeedb944fc8b47bfe1cad78732d6dd2308ee9ad/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java#L821,
> PrestoS3FileSystem does switch to iam role credentials if one is provided.
>
> Anyone successful using the s3 presto filesystem in flink v1.13.0?
>
>
> Thanks,
> Vamshi
>
>
> On Mon, Aug 16, 2021 at 3:59 AM David Morávek <dm...@apache.org> wrote:
>
>> Hi Vamshi,
>>
>> From your configuration I'm guessing that you're using Amazon S3 (not any
>> implementation such as Minio).
>>
>> Two comments:
>> - *s3.endpoint* should not contain bucket (this is included in your s3
>> path, eg. *s3://<bucket>/<file>*)
>> - "*s3.path.style.access*: true" is only correct for 3rd party
>> implementation such as Minio / Swift, that have bucket definied in url path
>> instead of subdomain
>>
>> You can find some information about connecting to s3 in Flink docs [1].
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/s3/
>> <https://urldefense.com/v3/__https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/s3/__;!!DCbAVzZNrAf4!RfOKZc2kW2eWOFMP6fnvNYnG0F8tq8oaCr08o2xPNF7G1L2OfoLZdZifyODfHBc3Nx4$>
>>
>> Best,
>> D.
>>
>>
>> On Tue, Aug 10, 2021 at 2:37 AM Vamshi G <vg...@salesforce.com>
>> wrote:
>>
>>> We are using Flink version 1.13.0 on Kubernetes.
>>> For checkpointing we have configured fs.s3 flink-s3-fs-presto.
>>> We have enabled sse on our buckets with kms cmk.
>>>
>>> flink-conf.yaml is configured as below.
>>> s3.entropy.key: _entropy_
>>> s3.entropy.length: 4
>>> s3.path.style.access: true
>>> s3.ssl.enabled: true
>>> s3.sse.enabled: true
>>> s3.sse.type: KMS
>>> s3.sse.kms-key-id: <ARN of keyid>
>>> s3.iam-role: <IAM role with read/write access to bucket>
>>> s3.endpoint: <bucketname>.s3-us-west-2.amazonaws.com
>>> <https://urldefense.com/v3/__http://s3-us-west-2.amazonaws.com__;!!DCbAVzZNrAf4!RfOKZc2kW2eWOFMP6fnvNYnG0F8tq8oaCr08o2xPNF7G1L2OfoLZdZifyODfwoagq5A$>
>>> s3.credentials-provider:
>>> com.amazonaws.auth.profile.ProfileCredentialsProvider
>>>
>>> However, PUT operations on the bucket are resulting in access denied
>>> error. Access policies for the role are checked and works fine when checked
>>> with CLI.
>>> Also, can't get to see debug logs from presto s3 lib, is there a way to
>>> enable logger for presto airlift logging?
>>>
>>> Any inputs on above issue?
>>>
>>>

-- 
Regards,
Parag Surajmal Somani.

Re: s3 access denied with flink-s3-fs-presto

Posted by Vamshi G <vg...@salesforce.com>.
s3a with hadoop s3 filesystem works fine for us wit sts assume role
credentials and with kms.
Below are how our hadoop s3a configs look like. Since the endpoint is
globally whitelisted, we don't explicitly mention the endpoint.

fs.s3a.aws.credentials.provider:
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
fs.s3a.assumed.role.credentials.provider:
com.amazonaws.auth.profile.ProfileCredentialsProvider
fs.s3a.assumed.role.arn: arn:aws:iam::<account>:role/<iam_role>
fs.s3a.server-side-encryption-algorithm: SSE-KMS
fs.s3a.server-side-encryption.key:
arn:aws:kms:<region>:<account>:key/<key-alias>


However, for checkpointing we definitely want to use presto s3, and just
could not make it work. FINE logging on presto-hive is not helping either,
as the lib uses airlift logger.
Also, based on the code here
https://github.com/prestodb/presto/blob/2aeedb944fc8b47bfe1cad78732d6dd2308ee9ad/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java#L821,
PrestoS3FileSystem does switch to iam role credentials if one is provided.

Anyone successful using the s3 presto filesystem in flink v1.13.0?


Thanks,
Vamshi


On Mon, Aug 16, 2021 at 3:59 AM David Morávek <dm...@apache.org> wrote:

> Hi Vamshi,
>
> From your configuration I'm guessing that you're using Amazon S3 (not any
> implementation such as Minio).
>
> Two comments:
> - *s3.endpoint* should not contain bucket (this is included in your s3
> path, eg. *s3://<bucket>/<file>*)
> - "*s3.path.style.access*: true" is only correct for 3rd party
> implementation such as Minio / Swift, that have bucket definied in url path
> instead of subdomain
>
> You can find some information about connecting to s3 in Flink docs [1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/s3/
> <https://urldefense.com/v3/__https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/s3/__;!!DCbAVzZNrAf4!RfOKZc2kW2eWOFMP6fnvNYnG0F8tq8oaCr08o2xPNF7G1L2OfoLZdZifyODfHBc3Nx4$>
>
> Best,
> D.
>
>
> On Tue, Aug 10, 2021 at 2:37 AM Vamshi G <vg...@salesforce.com> wrote:
>
>> We are using Flink version 1.13.0 on Kubernetes.
>> For checkpointing we have configured fs.s3 flink-s3-fs-presto.
>> We have enabled sse on our buckets with kms cmk.
>>
>> flink-conf.yaml is configured as below.
>> s3.entropy.key: _entropy_
>> s3.entropy.length: 4
>> s3.path.style.access: true
>> s3.ssl.enabled: true
>> s3.sse.enabled: true
>> s3.sse.type: KMS
>> s3.sse.kms-key-id: <ARN of keyid>
>> s3.iam-role: <IAM role with read/write access to bucket>
>> s3.endpoint: <bucketname>.s3-us-west-2.amazonaws.com
>> <https://urldefense.com/v3/__http://s3-us-west-2.amazonaws.com__;!!DCbAVzZNrAf4!RfOKZc2kW2eWOFMP6fnvNYnG0F8tq8oaCr08o2xPNF7G1L2OfoLZdZifyODfwoagq5A$>
>> s3.credentials-provider:
>> com.amazonaws.auth.profile.ProfileCredentialsProvider
>>
>> However, PUT operations on the bucket are resulting in access denied
>> error. Access policies for the role are checked and works fine when checked
>> with CLI.
>> Also, can't get to see debug logs from presto s3 lib, is there a way to
>> enable logger for presto airlift logging?
>>
>> Any inputs on above issue?
>>
>>

Re: s3 access denied with flink-s3-fs-presto

Posted by David Morávek <dm...@apache.org>.
Hi Vamshi,

From your configuration I'm guessing that you're using Amazon S3 (not any
implementation such as Minio).

Two comments:
- *s3.endpoint* should not contain bucket (this is included in your s3
path, eg. *s3://<bucket>/<file>*)
- "*s3.path.style.access*: true" is only correct for 3rd party
implementation such as Minio / Swift, that have bucket definied in url path
instead of subdomain

You can find some information about connecting to s3 in Flink docs [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/s3/

Best,
D.


On Tue, Aug 10, 2021 at 2:37 AM Vamshi G <vg...@salesforce.com> wrote:

> We are using Flink version 1.13.0 on Kubernetes.
> For checkpointing we have configured fs.s3 flink-s3-fs-presto.
> We have enabled sse on our buckets with kms cmk.
>
> flink-conf.yaml is configured as below.
> s3.entropy.key: _entropy_
> s3.entropy.length: 4
> s3.path.style.access: true
> s3.ssl.enabled: true
> s3.sse.enabled: true
> s3.sse.type: KMS
> s3.sse.kms-key-id: <ARN of keyid>
> s3.iam-role: <IAM role with read/write access to bucket>
> s3.endpoint: <bucketname>.s3-us-west-2.amazonaws.com
> s3.credentials-provider:
> com.amazonaws.auth.profile.ProfileCredentialsProvider
>
> However, PUT operations on the bucket are resulting in access denied
> error. Access policies for the role are checked and works fine when checked
> with CLI.
> Also, can't get to see debug logs from presto s3 lib, is there a way to
> enable logger for presto airlift logging?
>
> Any inputs on above issue?
>
>