You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/15 19:39:25 UTC

[GitHub] [airflow] notatallshaw-work opened a new issue, #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

notatallshaw-work opened a new issue, #25728:
URL: https://github.com/apache/airflow/issues/25728

   ### Apache Airflow version
   
   2.3.3
   
   ### What happened
   
   Upon upgrading from Airflow 2.1.3 to Airflow 2.3.3 we have an issue with our sensors that have mode='reschedule'. Using `TimeSensor` as example:
   
   1. It executes as normal on the first run
   2. It detects it is not the correct time yet and marks itself "UP_FOR_RESCEDULE"  (usually to rescheduled for 5 minutes in the future)
   3. When the time comes to be rescheduled it just gets marked as "QUEUED" and is never actually run again, the error in the log:
   `[2022-08-15 00:01:11,027] {base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='TestDAG', task_id='testTASK', run_id='scheduled__2022-08-12T04:00:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)`
   
   Looking at the relevant code (https://github.com/apache/airflow/blob/2.3.3/airflow/executors/base_executor.py#L215) it seems that the TaskID is never removed from `self.running`
   
   ### What you think should happen instead
   
   Rescheduled tasks should reschedule
   
   ### How to reproduce
   
   1. Airflow 2.3.3 from Docker
   2. Celery 5.2.7 with Redis backend
   3. MySQL 8
   4. Airflow Timezone set to America/New_York
   5. Have a normal (non-async) sensor that has mode reschedule and needs to reschedule itself
   
   ### Operating System
   
   Fedora 29
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   The symptoms of this discussion sounds the same, but no one has replied on it yet: https://github.com/apache/airflow/discussions/25651
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw-work commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
notatallshaw-work commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1218021011

   Thanks, I'll read through this and I'll see what I can do about the logs (they're big so I will need to cut down to the relevant part and also I'd need to get management sign off, if I'm able to reproduce outside my company then it will make the process a lot simpler).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw-work commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
notatallshaw-work commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1215848416

   It appears this was never an issue before 2.3.0 because the CeleryExecutor implemented it's own trigger_tasks logic, until this PR landed: https://github.com/apache/airflow/pull/23016


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw-work commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
notatallshaw-work commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1216672889

   I'm a bit confused by this part of this loop (comment removed for clarity):
   
   ```python
   for _ in range(min((open_slots, len(self.queued_tasks)))):
       key, (command, _, queue, ti) = sorted_queue.pop(0)
   
       if key in self.running:
           attempt = self.attempts[key]
           if attempt < QUEUEING_ATTEMPTS - 1:
               self.attempts[key] = attempt + 1
               self.log.info("task %s is still running", key)
               continue
   ```
   
   There's no sleep and no external call to check the status, so if `self.running` is being updated on another thread the only chance it really has is when `self.log` is called, if `self.running` is being updated by an asynchronous loop somewhere then it has to rely on the GIL giving it a chance to update when a tight non-asynchronous bit of code it running, seems unlikely?
   
   Reading the comments the situation it is supposed to be catching is when "the task has been killed externally and not yet been marked as failed", why does it not check the status of the task instead? In our case the status of the task is "UP_FOR_RESCHEDULE" and it doesn't make sense to me that the executor is confused about a task in that status being running or not?
   
   @malthe @potiuk Sorry to ping you directly, but I would be happy to help test and/or contribute if you could any hints or clarity on my questions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw-work commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
notatallshaw-work commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1219599153

   Looks like it was our fault! 
   
   It seems the issue was that our scheduler celery results backend was pointing to a different database than our worker celery results backend 🤦‍♂️. 
   
   Thanks for responding earlier, sorry it was on our side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw-work closed issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
notatallshaw-work closed issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule
URL: https://github.com/apache/airflow/issues/25728


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw-work commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
notatallshaw-work commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1216995288

   Enabling Debug logs I see something very interesting, on Airflow 2.1.3 I see this debug message "`Changing state:`" quite often: https://github.com/apache/airflow/blob/2.1.3/airflow/executors/base_executor.py#L198. But I never see the equivalent message in Airflow 2.3.3: debug logs even though it's still there https://github.com/apache/airflow/blob/2.3.3/airflow/executors/base_executor.py#L238
   
   Equivalently I see the "running task instances" debug message often go down to 0 in Airflow 2.1.3 but in Airflow 2.3.3 I never see this debug message go down to 0.
   
   @potiuk @malthe sorry to ping you directly but I'm really starting to think this is a bug in the change to celery executor rather than just our environment being broken. Are there any hints you can give that would help us better pin down what the problem might be?
   
   In the mean time I am going to try and see if I can reproduce the issue at home so I can post a reproducing example here that others can follow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1215681686

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] malthe commented on issue #25728: Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

Posted by GitBox <gi...@apache.org>.
malthe commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1217457826

   It would be useful to have logs to see what exactly is going on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org