You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/06/18 13:35:00 UTC

[jira] [Commented] (AIRFLOW-4809) s3_delete_objects_operator fails on empty list of keys

    [ https://issues.apache.org/jira/browse/AIRFLOW-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866603#comment-16866603 ] 

ASF GitHub Bot commented on AIRFLOW-4809:
-----------------------------------------

szczeles commented on pull request #5428: AIRFLOW-4809 | s3_delete_objects_operator should not fail on empty list of keys
URL: https://github.com/apache/airflow/pull/5428
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-4809) issues and references them in the PR title. 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   When s3_delete_objects_operator is used in a dynamic way (for example list of keys comes from s3_list_operator via XCom) there might be a case when the list of keys is empty. In my case it happens when chained operators are removing old files from S3 and there are no old files yet (because this is very first run of DAG).
   
   In case of empty `keys` hook raises an exception (via boto3):
   
   ```
   [2019-06-18 13:23:53,790] {{base_task_runner.py:101}} INFO - Job 115: Subtask delete_old_files [2019-06-18 13:23:53,790] {{cli.py:517}} INFO - Running <TaskInstance: xxxxxx.delete_old_files 2019-06-17T00:00:00+00:00 [running]> on host 82f571f444f5
   [2019-06-18 13:23:56,199] {{__init__.py:1580}} ERROR - An error occurred (MalformedXML) when calling the DeleteObjects operation: The XML you provided was not well-formed or did not validate against our published schema
   Traceback (most recent call last):
     File "/usr/local/lib/python3.6/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/s3_delete_objects_operator.py", line 80, in execute
       response = s3_hook.delete_objects(bucket=self.bucket, keys=self.keys)
     File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 542, in delete_objects
       Delete=delete_dict)
     File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
       return self._make_api_call(operation_name, kwargs)
     File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
       raise error_class(parsed_response, operation_name)
   botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the DeleteObjects operation: The XML you provided was not well-formed or did not validate against our published schema
   ```
   The provided patch modifies the operator behavior - if there is nothing to delete from S3 it just returns.
   
   ### Tests
   
   - [x] My PR adds the following unit tests 
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain docstrings that explain what it does
     - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> s3_delete_objects_operator fails on empty list of keys
> ------------------------------------------------------
>
>                 Key: AIRFLOW-4809
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4809
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib, operators
>    Affects Versions: 1.10.3
>            Reporter: Mariusz Strzelecki
>            Assignee: Mariusz Strzelecki
>            Priority: Major
>
> When s3_delete_objects_operator is used in a dynamic way (for example list of keys comes from s3_list_operator via XCom) there might be a case when the list of keys is empty. In my case it happens when chained operators are removing old files from S3 and there are no old files yet (because this is very first run of DAG).
> In case of empty `keys` hook raises an exception (via boto3):
> {noformat}
> [2019-06-18 13:23:53,790] {{base_task_runner.py:101}} INFO - Job 115: Subtask delete_old_files [2019-06-18 13:23:53,790] {{cli.py:517}} INFO - Running <TaskInstance: xxxxxx.delete_old_files 2019-06-17T00:00:00+00:00 [running]> on host 82f571f444f5
> [2019-06-18 13:23:56,199] {{__init__.py:1580}} ERROR - An error occurred (MalformedXML) when calling the DeleteObjects operation: The XML you provided was not well-formed or did not validate against our published schema
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
>     result = task_copy.execute(context=context)
>   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/s3_delete_objects_operator.py", line 80, in execute
>     response = s3_hook.delete_objects(bucket=self.bucket, keys=self.keys)
>   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 542, in delete_objects
>     Delete=delete_dict)
>   File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
>     return self._make_api_call(operation_name, kwargs)
>   File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
>     raise error_class(parsed_response, operation_name)
> botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the DeleteObjects operation: The XML you provided was not well-formed or did not validate against our published schema
> {noformat}
> I already have a patch that checks if there is anything to delete and if not, just returns from the operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)