You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/06 05:27:54 UTC

[GitHub] [airflow] lukas-at-harren edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

lukas-at-harren edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-813831995


   @kaxil I have checked, `min_file_process_interval` is set to `30`, however the problem is still there for me.
   
   @SalmonTimo I have a pretty high CPU utilisation (60%), albeit the scheduler settings are default. But why? Does this matter?
   
   ––
   
   Same issue, new day: I have Airflow running, the scheduler running, but the whole cluster has 103 scheduled tasks and 3 queued tasks, but nothing is running at all. I highly doubt that `min_file_process_interval` is the root of the problem.
   I suggest somebody mark that issue with a higher priority, I do not think that "regularly restarting the scheduler" is a reasonable solution.
   
   --
   
   What we need here is some factual inspection of the Python process.
   I am no Python expert, however I am proficient and know myself around in other VMs (Erlang, Ruby).
   
   Following that stack trace idea, I just learned that Python cannot dump a process (https://stackoverflow.com/a/141826/128351), unfortunately, otherwise I would have provided you with such a process dump of my running "scheduler".
   
   I am very happy to provide you with some facts about my stalled scheduler, if you tell me how you would debug such an issue.
   
   What I currently have:
   
   * CPU utilisation of the scheduler is still pretty high (around 60%).
   * `AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL` is set to `30`
   * `AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL` is set to `10`
   * Log output of scheduler:
   
   ```
   [2021-04-06 05:19:56,201] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
   
   [2021-04-06 05:19:57,865] {scheduler_job.py:941} INFO - 15 tasks up for execution:
   
   # ... snip ...
   
   [2021-04-06 05:19:57,876] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=mssql_dwh) with 0 open slots and 15 task instances ready to be queued
   [2021-04-06 05:19:57,882] {scheduler_job.py:985} INFO - Not scheduling since there are 0 open slots in pool mssql_dwh
   ```
   
   What I find striking, is the message `INFO - Not scheduling since there are 0 open slots in pool mssql_dwh`.
   That is a pool configured for max 3 slots. However no single task is running. I fear bug is, that the "scheduler" might be loosing track of running tasks on Kubernetes. Bluntly, I guess there is a bug in the components:
   
   * Scheduler
   * KubernetesExecutor
   * Pools


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org