You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/09 09:56:54 UTC

[GitHub] [airflow] shlomi-viz opened a new issue #18111: HeadObject error when trying to upload file to S3

shlomi-viz opened a new issue #18111:
URL: https://github.com/apache/airflow/issues/18111


   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   ```apache-airflow-providers-amazon==1.4.0
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-http==1.1.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-postgres==1.0.1
   apache-airflow-providers-sftp==1.1.1
   apache-airflow-providers-sqlite==2.0.1
   apache-airflow-providers-ssh==1.3.0
   ```
   
   ### Apache Airflow version
   
   2.0.2
   
   ### Operating System
   
   Mac and Linux
   
   ### Deployment
   
   MWAA
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When trying to upload a file to another AWS account using the `load_file` function, I got the following 
   ```
   An error occurred (403) when calling the HeadObject operation: Forbidden
   ```
   looking at source code the error is from 
   ```
       @provide_bucket_name
       @unify_bucket_name_and_key
       def check_for_key(self, key: str, bucket_name: Optional[str] = None) -> bool:
           """
           Checks if a key exists in a bucket
   
           :param key: S3 key that will point to the file
           :type key: str
           :param bucket_name: Name of the bucket in which the file is stored
           :type bucket_name: str
           :return: True if the key exists and False if not.
           :rtype: bool
           """
           try:
               self.get_conn().head_object(Bucket=bucket_name, Key=key)
               return True
           except ClientError as e:
               if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
                   return False
               else:
                   raise e
   ```
   
   As I understand it I'm getting this error since I don't a `List` permissions on the bucket in the other account
   
   ### What you expected to happen
   
   AirFlow should try and upload the file, even if it already exists. In this section of the code we can add another check for the response code we get when no `List` permissions
   
   ```python
           except ClientError as e:
               if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
                   return False
               else:
                   raise e
   ````
   
   ### How to reproduce
   
   Use the following command to upload the file to an S3 bucket that you only have `Get` and `Put` permissions.
   ```python 
   s3_hook.load_file(local_file_path, bucket_name=bucket_name, key=s3_key, acl_policy='bucket-owner-full-control')
   ````
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937704490


   > IMO, It's not reasonable for us to swallow the permission error.
   
   Fully agree with @xinbinhuang. The 404 is explicit "I have permissions, but the file does not exist". This is entirely different than "The file exist but I have no permissions". IMHO Replace=True does exactly what it should do - replaces stuff regardless if the file is there or whether you even have permission to list them. Replace = false should fail if you cannot check if the file is there - even if you have no list permission - generally if you have not enough permissions to check if the file is there, and if you want to use "replace = False", I'd expect to get exactly what you get - permission error (precisely because you have no permissions to check it). So for me this works as intended and expected. I will convert it into discussion if you want to further discuss it and close the PR, but I think it's just wrong to add 403 here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895


   > I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
   > 
   > My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > to:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > shortest PR ever 😆 I will try to submit one next week.
   
   I believe the `replace=True` flag will just upload the file regardless if the key/file exists or not.  We don't call the `check_for_key` function when `replace=True`. Have you tried it?
   
   IMO, It's not reasonable for us to swallow the permission error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #18111:
URL: https://github.com/apache/airflow/issues/18111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895


   > I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
   > 
   > My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > to:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > shortest PR ever 😆 I will try to submit one next week.
   
   I believe the `replace` flag will just upload the file regardless if the key exists or not. Have you tried it? We don't call the `check_for_key` function when `replace=True`.
   
   IMO, It's not reasonable for us to swallow the permission error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895


   > I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
   > 
   > My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > to:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > shortest PR ever 😆 I will try to submit one next week.
   
   I believe the `replace=True` flag will just upload the file regardless if the key/file exists or not.  We don't call the `check_for_key` function when `replace=True`. Have you tried it?
   
   IMO, It's not reasonable for us to swallow the permission error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-916009650


   I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists. 
   
   My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
   ```python
           except ClientError as e:
               if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
                   return False
               else:
                   raise e
   ```
   
   to:
   
   ```python 
           except ClientError as e:
               if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
                   return False
               else:
                   raise e
   ```
   
   shortest PR ever 😆  I will try to submit one next week. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937704490


   > IMO, It's not reasonable for us to swallow the permission error.
   
   Fully agree with @xinbinhuang. The 404 is explicit "I have permissions, but the file does not exist". This is entirely different than "The file exist but I have no permissions". IMHO Replace=True does exactly what it should do - replaces stuff regardless if the file is there or whether you even have permission to list them. Replace = false should fail if you cannot check if the file is there - even if you have no list permission - generally if you have not enough permissions to check if the file is there, and if you want to use "replace = False", I'd expect to get exactly what you get - permission error (precisely because you have no permissions to check it). So for me this works as intended and expected. I will convert it into discussion if you want to further discuss it and close the PR, but I think it's just wrong to add 403 here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937513407


   It will work with `replace=True`, but this miss-leading. If we don't want to `swallow the permission error` why do we check for the 404 error? 
   I think that in both cases (with `List` permissions and without) the result should be the same. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-937513407


   It will work with `replace=True`, but this miss-leading. If we don't want to `swallow the permission error` why do we check for the 404 error? 
   I think that in both cases (with `List` permissions and without) the result should be the same. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shlomi-viz commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
shlomi-viz commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-918562444


   I created a PR https://github.com/apache/airflow/pull/18201, what is the process to get it reviewed? Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #18111:
URL: https://github.com/apache/airflow/issues/18111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-936953895


   > I want it to be able to upload a file in any case (file already exists or not), I know about the `replace` flag, but I also want to cover the case when the file doesn't exists.
   > 
   > My suggestion is to catch the error and allow the upload, without adding another error. It will be something like, updating:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > to:
   > 
   > ```python
   >         except ClientError as e:
   >             if e.response["ResponseMetadata"]["HTTPStatusCode"] IN (404, 403):
   >                 return False
   >             else:
   >                 raise e
   > ```
   > 
   > shortest PR ever 😆 I will try to submit one next week.
   
   I believe the `replace=True` flag will just upload the file regardless if the key exists or not.  We don't call the `check_for_key` function when `replace=True`. Have you tried it?
   
   IMO, It's not reasonable for us to swallow the permission error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #18111: HeadObject error when trying to upload file to S3

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #18111:
URL: https://github.com/apache/airflow/issues/18111#issuecomment-915986024


   > AirFlow should try and upload the file, even if it already exists
   
   `load_file` has `replace` parameter (default False)
   https://github.com/apache/airflow/blob/43f595fe1b8cd6f325d8535c03ee219edbf4a559/airflow/providers/amazon/aws/hooks/s3.py#L499-L500 
   You don't set it in your example so it will not try to overwrite existed key.
   
   Is your suggestion only catch the error from the API and change it to an Airflow error?
   You are always welcome to open PR directly with your suggested code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org