You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/10 17:13:46 UTC

[GitHub] [airflow] fjmacagno opened a new issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

fjmacagno opened a new issue #20788:
URL: https://github.com/apache/airflow/issues/20788


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   Our airflow instance was not scheduling any tasks, even simple ones using the default pools. The log showed that it was attempting to run 64 tasks, and that every one was trying to use a pool that didn't exist. When i created the missing pool the scheduler started the tasks and started clearing the queue.
   
   ### What you expected to happen
   
   The scheduler to continue running correctly-configured tasks, ignoring the incorrectly configured ones, rather than blocking.
   
   ### How to reproduce
   
   Create a dag with 64 concurrent tasks, and set a pool that doesnt exist. Create a second dag using the default pool for a single task. Trigger the first, then the second.
   
   ### Operating System
   
   ubuntu
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Using KubernetesExecutor connected to EKS.
   
   ### Anything else
   
   Unfortunately i don't have access to the logs anymore.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010223049


   >  I just dont think the whole scheduler should stop scheduling tasks because one dag is misconfigured. 
   
   Agreed - although in this case though I think that setting `dagrun_timeout`, `max_active_runs` or `max_active_tasks` on your DAG can reduce the ability of a single DAG to use too many resources and thus limit the blast radius of such an incident. 
   
   > Maybe it could at least be shoved to the back of the queue so that other tasks can try to run?
   
   Agreed, I'll have a look through the scheduler logic to see how viable this is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1051411415


   @SamWheating @fjmacagno let me know what you think about my PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fjmacagno commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
fjmacagno commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010267758


   Thats an interesting point though, because we do have most of those set. We cant do dagrun_timeout because it is a 15 day long dag, but max_active_runs is 1 and dag_concurrency is 16, while scheduler parallelism is 64, so it sounds like something is amiss there too. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fjmacagno commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
fjmacagno commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010173998


   I mean, I just dont think the whole scheduler should stop scheduling tasks because one dag is misconfigured. This caused an entire cross-team airflow installation to stop working because one team made a mistake on one dag, and gods help us if that had been prod.
   
   It seems like if a task is misconfigured in some way that prevents it from running, it shouldn't be considered to be in the queue. Maybe it could at least be shoved to the back of the queue so that other tasks can try to run?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010135201


   I think that this is mostly functioning as intended, but I'm wondering if we can improve the behaviour around nonexistent pools 🤔 I think that this is a somewhat common issue and it can lead to pretty unclear behaviour if a user makes a mistake in the name of a pool. 
   
   Maybe we should be failing tasks immediately if they're assigned to a pool which doesn't exist?
   
   I'll have a look into whether this is possible, but would definitely appreciate any other suggestions here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1051449145


   Your PR looks good! 
   
   I think that https://github.com/apache/airflow/pull/19747 also fixes this issue, but I like your approach more as it will prevent this un-runnable DAG from ever making it to the scheduler. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010169302


   OK, so looking into this a bit, the scheduler will log a warning if a task is unschedulable due to non-existent pool:
   
   https://github.com/apache/airflow/blob/905baf9fa5402ccc062536915fd1911d812f625b/airflow/jobs/scheduler_job.py#L335-L339
   
   This warning is also visible in the TaskInstance Details UI:
   ![image](https://user-images.githubusercontent.com/16950874/148987077-295ed5d5-69a5-4be1-a46c-fda1ea9e1828.png)
   
   And then it will remain in `queued` state indefinitely (or until it times out I suppose)
   
   It would be really simple to just mark the tasks as failed after logging something like `Tasks using non-existent pool '%s' being marked as Failed`, but this might be a worse user experience as it leaves no logs or visible warnings about why the task failed (other than the scheduler logs, which are not easily accessible to most users)
   
   With this in mind, does anyone have another idea for how to prevent these tasks from clogging the scheduler, or should we just consider this to be intended behaviour?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org