You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/23 23:01:30 UTC

[GitHub] [airflow] careduz commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

careduz commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784574696


   We are facing the same issue (scheduler liveness probe always failing and restarting the scheduler). Details:
   
   **Airflow: Version 1.10.14**
   **Kubernetes: Version 1.20.2** (DigitalOcean)
   **Helm airflow-stable/airflow: Version 7.16.0**
   
   ```
   Events:
     Type     Reason     Age                From               Message
     ----     ------     ----               ----               -------
     Normal   Scheduled  27m                default-scheduler  Successfully assigned airflow/airflow-scheduler-75c6c96d68-r9j4m to apollo-kaon3thg1-882c2
     Normal   Pulled     27m                kubelet            Container image "alpine/git:latest" already present on machine
     Normal   Created    27m                kubelet            Created container git-clone
     Normal   Started    27m                kubelet            Started container git-clone
     Normal   Pulled     26m                kubelet            Container image "alpine/git:latest" already present on machine
     Normal   Created    26m                kubelet            Created container git-sync
     Normal   Started    26m                kubelet            Started container git-sync
     Normal   Killing    12m (x2 over 19m)  kubelet            Container airflow-scheduler failed liveness probe, will be restarted
     Normal   Pulled     11m (x3 over 26m)  kubelet            Container image "apache/airflow:1.10.14-python3.7" already present on machine
     Normal   Created    11m (x3 over 26m)  kubelet            Created container airflow-scheduler
     Normal   Started    11m (x3 over 26m)  kubelet            Started container airflow-scheduler
     Warning  Unhealthy  6m (x12 over 21m)  kubelet            Liveness probe failed:
   ```
   
   And the logs are basically on a loop:
   ```
   1] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor409-Process, stopped)>
   [2021-02-23 22:58:35,578] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:35,578] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:35,581] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:35,581] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:35,582] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:35,582] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:35,582] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:35,587] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:35,587] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:36,589] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
   [2021-02-23 22:58:36,729] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6719)
   [2021-02-23 22:58:36,930] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6717)
   [2021-02-23 22:58:37,258] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor410-Process, stopped)>
   [2021-02-23 22:58:37,259] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor411-Process, stopped)>
   [2021-02-23 22:58:37,582] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:37,583] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:37,584] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,586] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,588] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,589] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:37,592] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:37,593] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:37,602] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:37,604] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:37,605] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:37,607] {scheduler_job.py:1460} DEBUG - Heartbeating the scheduler
   [2021-02-23 22:58:37,620] {base_job.py:197} DEBUG - [heartbeat]
   [2021-02-23 22:58:37,630] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.05 seconds
   [2021-02-23 22:58:37,631] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:38,165] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6769)
   [2021-02-23 22:58:38,268] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6765)
   [2021-02-23 22:58:38,276] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor412-Process, started)>
   [2021-02-23 22:58:38,284] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor413-Process, stopped)>
   [2021-02-23 22:58:38,633] {scheduler_job.py:1484} DEBUG - Sleeping for 0.95 seconds to prevent excessive logging
   [2021-02-23 22:58:39,331] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6797)
   [2021-02-23 22:58:39,361] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6801)
   [2021-02-23 22:58:39,589] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:39,589] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:39,591] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:39,591] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:39,592] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:39,593] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:39,594] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:39,596] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:39,597] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:40,305] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor414-Process, stopped)>
   [2021-02-23 22:58:40,306] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor415-Process, stopped)>
   [2021-02-23 22:58:40,599] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
   [2021-02-23 22:58:41,349] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6829)
   [2021-02-23 22:58:41,386] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6831)
   [2021-02-23 22:58:41,595] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:41,595] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:41,596] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,597] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,598] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,599] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,600] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:41,601] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:41,602] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:41,602] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:41,604] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:41,604] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:41,607] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:41,608] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org