You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/10 17:13:46 UTC
[GitHub] [airflow] fjmacagno opened a new issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
fjmacagno opened a new issue #20788:
URL: https://github.com/apache/airflow/issues/20788
### Apache Airflow version
2.2.3 (latest released)
### What happened
Our airflow instance was not scheduling any tasks, even simple ones using the default pools. The log showed that it was attempting to run 64 tasks, and that every one was trying to use a pool that didn't exist. When i created the missing pool the scheduler started the tasks and started clearing the queue.
### What you expected to happen
The scheduler to continue running correctly-configured tasks, ignoring the incorrectly configured ones, rather than blocking.
### How to reproduce
Create a dag with 64 concurrent tasks, and set a pool that doesnt exist. Create a second dag using the default pool for a single task. Trigger the first, then the second.
### Operating System
ubuntu
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other Docker-based deployment
### Deployment details
Using KubernetesExecutor connected to EKS.
### Anything else
Unfortunately i don't have access to the logs anymore.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010223049
> I just dont think the whole scheduler should stop scheduling tasks because one dag is misconfigured.
Agreed - although in this case though I think that setting `dagrun_timeout`, `max_active_runs` or `max_active_tasks` on your DAG can reduce the ability of a single DAG to use too many resources and thus limit the blast radius of such an incident.
> Maybe it could at least be shoved to the back of the queue so that other tasks can try to run?
Agreed, I'll have a look through the scheduler logic to see how viable this is.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1051411415
@SamWheating @fjmacagno let me know what you think about my PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] fjmacagno commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
fjmacagno commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010267758
Thats an interesting point though, because we do have most of those set. We cant do dagrun_timeout because it is a 15 day long dag, but max_active_runs is 1 and dag_concurrency is 16, while scheduler parallelism is 64, so it sounds like something is amiss there too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] fjmacagno commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
fjmacagno commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010173998
I mean, I just dont think the whole scheduler should stop scheduling tasks because one dag is misconfigured. This caused an entire cross-team airflow installation to stop working because one team made a mistake on one dag, and gods help us if that had been prod.
It seems like if a task is misconfigured in some way that prevents it from running, it shouldn't be considered to be in the queue. Maybe it could at least be shoved to the back of the queue so that other tasks can try to run?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010135201
I think that this is mostly functioning as intended, but I'm wondering if we can improve the behaviour around nonexistent pools 🤔 I think that this is a somewhat common issue and it can lead to pretty unclear behaviour if a user makes a mistake in the name of a pool.
Maybe we should be failing tasks immediately if they're assigned to a pool which doesn't exist?
I'll have a look into whether this is possible, but would definitely appreciate any other suggestions here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1051449145
Your PR looks good!
I think that https://github.com/apache/airflow/pull/19747 also fixes this issue, but I like your approach more as it will prevent this un-runnable DAG from ever making it to the scheduler.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] SamWheating commented on issue #20788: AIrflow Scheduler does not schedule any tasks when >max running tasks queued with non-existant pool
Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #20788:
URL: https://github.com/apache/airflow/issues/20788#issuecomment-1010169302
OK, so looking into this a bit, the scheduler will log a warning if a task is unschedulable due to non-existent pool:
https://github.com/apache/airflow/blob/905baf9fa5402ccc062536915fd1911d812f625b/airflow/jobs/scheduler_job.py#L335-L339
This warning is also visible in the TaskInstance Details UI:
![image](https://user-images.githubusercontent.com/16950874/148987077-295ed5d5-69a5-4be1-a46c-fda1ea9e1828.png)
And then it will remain in `queued` state indefinitely (or until it times out I suppose)
It would be really simple to just mark the tasks as failed after logging something like `Tasks using non-existent pool '%s' being marked as Failed`, but this might be a worse user experience as it leaves no logs or visible warnings about why the task failed (other than the scheduler logs, which are not easily accessible to most users)
With this in mind, does anyone have another idea for how to prevent these tasks from clogging the scheduler, or should we just consider this to be intended behaviour?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org