You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/18 20:58:31 UTC

[GitHub] [airflow] ephraimbuddy opened a new pull request #15929: Fail tasks in scheduler when executor reports they failed

ephraimbuddy opened a new pull request #15929:
URL: https://github.com/apache/airflow/pull/15929


   When a task fails in executor while still queued in scheduler, the executor reports
   this failure but scheduler doesn't change the task state resulting in the task
   being queued until the scheduler is restarted. This commit fixes it by ensuring
   that when a task is reported to have failed in the executor, the task is failed
   in scheduler
   
   
   This can be reproduced with CeleryExecutor in breeze
   1. Start breeze without redis integration
   2. start the webserver and scheduler
   3. Run a dag e.g example_bash_operator.py
   4. The CeleryExecutor will raise exceptions and fails the task
   4. Notice that the tasks remain queued
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy closed pull request #15929: Fail tasks in scheduler when executor reports they failed

Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed pull request #15929:
URL: https://github.com/apache/airflow/pull/15929


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy merged pull request #15929: Fail tasks in scheduler when executor reports they failed

Posted by GitBox <gi...@apache.org>.
ephraimbuddy merged pull request #15929:
URL: https://github.com/apache/airflow/pull/15929


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15929: Fail tasks in scheduler when executor reports they failed

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15929:
URL: https://github.com/apache/airflow/pull/15929#discussion_r634746031



##########
File path: tests/jobs/test_scheduler_job.py
##########
@@ -907,13 +907,13 @@ def test_process_executor_events(self, mock_stats_incr, mock_task_callback):
 
         self.scheduler_job._process_executor_events(session=session)
         ti1.refresh_from_db()
-        assert ti1.state == State.QUEUED
+        assert ti1.state == State.FAILED
         mock_task_callback.assert_called_once_with(
             full_filepath='/test_path1/',
             simple_task_instance=mock.ANY,
             msg='Executor reports task instance '
-            '<TaskInstance: test_process_executor_events.dummy_task 2016-01-01 00:00:00+00:00 [queued]> '
-            'finished (failed) although the task says its queued. (Info: None) '
+            '<TaskInstance: test_process_executor_events.dummy_task 2016-01-01 00:00:00+00:00 [failed]> '
+            'finished (failed) although the task says its failed. (Info: None) '

Review comment:
       Hmmm, this log message now doesn't make any sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15929: Fail tasks in scheduler when executor reports they failed

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15929:
URL: https://github.com/apache/airflow/pull/15929#discussion_r634746943



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1239,6 +1239,8 @@ def _process_executor_events(self, session: Session = None) -> int:
                     "task says its %s. (Info: %s) Was the task killed externally?"
                 )
                 self.log.error(msg, ti, state, ti.state, info)
+                self.log.info('Setting task instance %s state to %s as reported by executor', ti, state)
+                ti.set_state(state)

Review comment:
       I think if you moved these two lines to below the `request =` then the msg would make more sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15929: Fail tasks in scheduler when executor reports they failed

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15929:
URL: https://github.com/apache/airflow/pull/15929#discussion_r634793823



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1239,6 +1239,8 @@ def _process_executor_events(self, session: Session = None) -> int:
                     "task says its %s. (Info: %s) Was the task killed externally?"
                 )
                 self.log.error(msg, ti, state, ti.state, info)
+                self.log.info('Setting task instance %s state to %s as reported by executor', ti, state)
+                ti.set_state(state)

Review comment:
       Yes. Done. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15929: Fail tasks in scheduler when executor reports they failed

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15929:
URL: https://github.com/apache/airflow/pull/15929#issuecomment-844556804


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org