You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/09 09:56:54 UTC
[GitHub] [airflow] shlomi-viz opened a new issue #18111: HeadObject error when trying to upload file to S3
shlomi-viz opened a new issue #18111:
URL: https://github.com/apache/airflow/issues/18111
### Apache Airflow Provider(s)
amazon
### Versions of Apache Airflow Providers
```apache-airflow-providers-amazon==1.4.0
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-http==1.1.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-postgres==1.0.1
apache-airflow-providers-sftp==1.1.1
apache-airflow-providers-sqlite==2.0.1
apache-airflow-providers-ssh==1.3.0
```
### Apache Airflow version
2.0.2
### Operating System
Mac and Linux
### Deployment
MWAA
### Deployment details
_No response_
### What happened
When trying to upload a file to another AWS account using the `load_file` function, I got the following
```
An error occurred (403) when calling the HeadObject operation: Forbidden
```
looking at source code the error is from
```
@provide_bucket_name
@unify_bucket_name_and_key
def check_for_key(self, key: str, bucket_name: Optional[str] = None) -> bool:
"""
Checks if a key exists in a bucket
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
:return: True if the key exists and False if not.
:rtype: bool
"""
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
return False
else:
raise e
```
As I understand it I'm getting this error since I don't a `List` permissions on the bucket in the other account
### What you expected to happen
AirFlow should try and upload the file, even if it already exists. In this section of the code we can add another check for the response code we get when no `List` permissions
```python
except ClientError as e:
if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
return False
else:
raise e
````
### How to reproduce
Use the following command to upload the file to an S3 bucket that you only have `Get` and `Put` permissions.
```python
s3_hook.load_file(local_file_path, bucket_name=bucket_name, key=s3_key, acl_policy='bucket-owner-full-control')
````
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937704490
> IMO, It's not reasonable for us to swallow the permission error.
Fully agree with @xinbinhuang. The 404 is explicit "I have permissions, but the file does not exist". This is entirely different than "The file exist but I have no permissions". IMHO Replace=True does exactly what it should do - replaces stuff regardless if the file is there or whether you even have permission to list them. Replace = false should fail if you cannot check if the file is there - even if you have no list permission - generally if you have not enough permissions to check if the file is there, and if you want to use "replace = False", I'd expect to get exactly what you get - permission error (precisely because you have no permissions to check it). So for me this works as intended and expected. I will convert it into discussion if you want to further discuss it and close the PR, but I think it's just wrong to add 403 here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] xinbinhuang edited a comment on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895
> I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
>
> My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
> return False
> else:
> raise e
> ```
>
> to:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
> return False
> else:
> raise e
> ```
>
> shortest PR ever 😆 I will try to submit one next week.
I believe the `replace=True` flag will just upload the file regardless if the key/file exists or not. We don't call the `check_for_key` function when `replace=True`. Have you tried it?
IMO, It's not reasonable for us to swallow the permission error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #18111:
URL: https://github.com/apache/airflow/issues/18111
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] xinbinhuang commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895
> I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
>
> My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
> return False
> else:
> raise e
> ```
>
> to:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
> return False
> else:
> raise e
> ```
>
> shortest PR ever 😆 I will try to submit one next week.
I believe the `replace` flag will just upload the file regardless if the key exists or not. Have you tried it? We don't call the `check_for_key` function when `replace=True`.
IMO, It's not reasonable for us to swallow the permission error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] xinbinhuang edited a comment on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895
> I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
>
> My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
> return False
> else:
> raise e
> ```
>
> to:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
> return False
> else:
> raise e
> ```
>
> shortest PR ever 😆 I will try to submit one next week.
I believe the `replace=True` flag will just upload the file regardless if the key/file exists or not. We don't call the `check_for_key` function when `replace=True`. Have you tried it?
IMO, It's not reasonable for us to swallow the permission error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-916009650
I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
```python
except ClientError as e:
if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
return False
else:
raise e
```
to:
```python
except ClientError as e:
if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
return False
else:
raise e
```
shortest PR ever 😆 I will try to submit one next week.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937704490
> IMO, It's not reasonable for us to swallow the permission error.
Fully agree with @xinbinhuang. The 404 is explicit "I have permissions, but the file does not exist". This is entirely different than "The file exist but I have no permissions". IMHO Replace=True does exactly what it should do - replaces stuff regardless if the file is there or whether you even have permission to list them. Replace = false should fail if you cannot check if the file is there - even if you have no list permission - generally if you have not enough permissions to check if the file is there, and if you want to use "replace = False", I'd expect to get exactly what you get - permission error (precisely because you have no permissions to check it). So for me this works as intended and expected. I will convert it into discussion if you want to further discuss it and close the PR, but I think it's just wrong to add 403 here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937513407
It will work with `replace=True`, but this miss-leading. If we don't want to `swallow the permission error` why do we check for the 404 error?
I think that in both cases (with `List` permissions and without) the result should be the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937513407
It will work with `replace=True`, but this miss-leading. If we don't want to `swallow the permission error` why do we check for the 404 error?
I think that in both cases (with `List` permissions and without) the result should be the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-918562444
I created a PR https://github.com/apache/airflow/pull/18201, what is the process to get it reviewed? Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #18111:
URL: https://github.com/apache/airflow/issues/18111
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] xinbinhuang edited a comment on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895
> I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
>
> My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
> return False
> else:
> raise e
> ```
>
> to:
>
> ```python
> except ClientError as e:
> if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
> return False
> else:
> raise e
> ```
>
> shortest PR ever 😆 I will try to submit one next week.
I believe the `replace=True` flag will just upload the file regardless if the key exists or not. We don't call the `check_for_key` function when `replace=True`. Have you tried it?
IMO, It's not reasonable for us to swallow the permission error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #18111: HeadObject error when trying to upload file to S3
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-915986024
> AirFlow should try and upload the file, even if it already exists
`load_file` has `replace` parameter (default False)
https://github.com/apache/airflow/blob/43f595fe1b8cd6f325d8535c03ee219edbf4a559/airflow/providers/amazon/aws/hooks/s3.py#L499-L500
You don't set it in your example so it will not try to overwrite existed key.
Is your suggestion only catch the error from the API and change it to an Airflow error?
You are always welcome to open PR directly with your suggested code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org