You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/27 14:40:27 UTC

[GitHub] [airflow] lidalei opened a new pull request, #25342: Fix BigQueryInsertJobOperator cancel_on_kill

lidalei opened a new pull request, #25342:
URL: https://github.com/apache/airflow/pull/25342

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   This PR fixes an issue with BigQueryInsertJobOperator. If the task reaches timeout set by task `execution_timeout`, `on_kill` will be called but the `self.job_id` is None. This is because the function `_submit_job` is a blocking call but `self.job_id` is only set after it. This PR is hugely inspired by https://github.com/apache/airflow/pull/22955.
   
   <!--
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   --->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25342: Fix BigQueryInsertJobOperator cancel_on_kill

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25342:
URL: https://github.com/apache/airflow/pull/25342#issuecomment-1199203518

   Some errors though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] lidalei commented on pull request #25342: Fix BigQueryInsertJobOperator cancel_on_kill

Posted by GitBox <gi...@apache.org>.
lidalei commented on PR #25342:
URL: https://github.com/apache/airflow/pull/25342#issuecomment-1199455724

   > Some errors though
   
   Fixed the failed test case. The `Conflict` exception will be raised when we call `_begin`.
   ```
   Traceback (most recent call last):
     File "/Users/dalei/.pyenv/versions/3.8.10/lib/python3.8/code.py", line 90, in runcode
       exec(code, self.locals)
     File "<input>", line 1, in <module>
     File "/Users/dalei/go/src/github.com/lidalei/airflow/airflow/providers/google/common/hooks/base_google.py", line 463, in inner_wrapper
       return func(self, *args, **kwargs)
     File "/Users/dalei/go/src/github.com/lidalei/airflow/airflow/providers/google/cloud/hooks/bigquery.py", line 1542, in insert_job
       job._begin()
     File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1298, in _begin
       super(QueryJob, self)._begin(client=client, retry=retry, timeout=timeout)
     File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 510, in _begin
       api_response = client._call_api(
     File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 759, in _call_api
       return call()
     File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
       return retry_target(
     File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
       return target()
     File "/Users/dalei/go/src/github.com/lidalei/airflow/venv/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
       raise exceptions.from_http_response(response)
   google.api_core.exceptions.Conflict: 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/xxx/jobs?prettyPrint=false: Already Exists: Job xxx:EU.abc_test
   Location: EU
   Job ID: abc_test
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #25342: Fix BigQueryInsertJobOperator cancel_on_kill

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #25342:
URL: https://github.com/apache/airflow/pull/25342#issuecomment-1205328475

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #25342: Fix BigQueryInsertJobOperator cancel_on_kill

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25342:
URL: https://github.com/apache/airflow/pull/25342


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #25342: Fix BigQueryInsertJobOperator cancel_on_kill

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #25342:
URL: https://github.com/apache/airflow/pull/25342#discussion_r933313569


##########
airflow/providers/google/cloud/operators/bigquery.py:
##########
@@ -2221,10 +2217,16 @@ def execute(self, context: Any):
                             BigQueryTableLink.persist(**persist_kwargs)
 
         self.job_id = job.job_id
-        return job.job_id
+        # Wait for the job to complete
+        job.result(timeout=self.result_timeout, retry=self.result_retry)
+        self._handle_job_error(job)
+
+        return self.job_id
 
     def on_kill(self) -> None:
         if self.job_id and self.cancel_on_kill:
             self.hook.cancel_job(  # type: ignore[union-attr]
                 job_id=self.job_id, project_id=self.project_id, location=self.location
             )
+        else:
+            self.log.info(f'Skipping to cancel job: {self.project_id}:{self.location}.{self.job_id}')

Review Comment:
   ```suggestion
               self.log.info('Skipping to cancel job: %s:%s.%s', self.project_id, self.location, self.job_id)
   ```
   We should [try to avoid f-strings in logging statements](https://blog.pilosus.org/posts/2020/01/24/python-f-strings-in-logging/) and use %-format style instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org