You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/16 14:38:04 UTC
[GitHub] [airflow] stoiandl opened a new issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
stoiandl opened a new issue #14261:
URL: https://github.com/apache/airflow/issues/14261
**Apache Airflow version 2.0**:
**Kubernetes version 1.18.14**
**Environment**: Azure - AKS
**What happened**:
I have just upgraded my Airflow from 1.10.13 to 2.0. I am running it in Kubernetes (AKS Azure) with Kubernetes Executor. Unfortunately, I see my Scheduler getting killed every 15-20 mins due to Liveness probe failing. Hence my pod keeps restarting.
**Liveness probe**
```
import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys
with create_session() as session:
job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
SchedulerJob.latest_heartbeat.desc()).limit(1).first()
sys.exit(0 if job.is_alive() else 1)
```
**Scheduler logs**
```
[2021-02-16 12:18:22,422] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-16 12:18:22,426] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-16 12:18:22,426] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-16 12:18:22,426] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-16 12:18:22,427] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-16 12:18:22,427] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-16 12:18:22,427] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-16 12:18:22,439] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2021-02-16 12:18:22,452] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12819)
[2021-02-16 12:18:22,460] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor490-Process' pid=12819 parent=9286 stopped exitcode=0>
[2021-02-16 12:18:23,009] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12826)
[2021-02-16 12:18:23,017] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor491-Process' pid=12826 parent=9286 stopped exitcode=0>
[2021-02-16 12:18:23,594] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12833)
... Many of these Disposing DB connection pool entries here
[2021-02-16 12:20:08,212] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor675-Process' pid=14146 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:08,916] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14153)
[2021-02-16 12:20:08,924] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor676-Process' pid=14153 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:09,475] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14160)
[2021-02-16 12:20:09,484] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor677-Process' pid=14160 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:10,044] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14167)
[2021-02-16 12:20:10,053] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor678-Process' pid=14167 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:10,610] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14180)
[2021-02-16 12:23:42,287] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
[2021-02-16 12:23:43,290] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 9286
[2021-02-16 12:23:43,494] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
[2021-02-16 12:23:43,503] {process_utils.py:61} INFO - Process psutil.Process(pid=14180, status='terminated', started='12:20:09') (14180) terminated with exit code None
[2021-02-16 12:23:43,503] {process_utils.py:61} INFO - Process psutil.Process(pid=9286, status='terminated', exitcode=0, started='12:13:35') (9286) terminated with exit code 0
[2021-02-16 12:23:43,506] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 9286
[2021-02-16 12:23:43,506] {scheduler_job.py:1296} INFO - Exited execute loop
[2021-02-16 12:23:43,523] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
[2021-02-16 12:23:43,525] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7)
```
** Scheduler deployment**
```
---
################################
## Airflow Scheduler Deployment/StatefulSet
#################################
kind: Deployment
apiVersion: apps/v1
metadata:
name: airflow-scheduler
namespace: airflow
labels:
tier: airflow
component: scheduler
spec:
replicas: 1
selector:
matchLabels:
tier: airflow
component: scheduler
template:
metadata:
labels:
tier: airflow
component: scheduler
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
nodeSelector:
{}
affinity:
{}
tolerations:
[]
restartPolicy: Always
terminationGracePeriodSeconds: 10
serviceAccountName: airflow-scheduler
securityContext:
runAsUser: 50000
fsGroup: 50000
initContainers:
- name: run-airflow-migrations
image: apache/airflow:2.0.0-python3.8
imagePullPolicy: IfNotPresent
# Support running against 1.10.x and 2.0.0dev/master
args: ["bash", "-c", "airflow db upgrade"]
env:
# Dynamically created environment variables
# Dynamically created secret envs
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: fernet-key
key: fernet-key
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-airflow-metadata
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-airflow-metadata
key: connection
containers:
# Always run the main scheduler container.
- name: scheduler
image: apache/airflow:2.0.0-python3.8
imagePullPolicy: Always
args: ["bash", "-c", "exec airflow scheduler"]
env:
# Dynamically created environment variables
# Dynamically created secret envs
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: fernet-key
key: fernet-key
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-airflow-metadata
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-airflow-metadata
key: connection
- name: DEPENDENCIES
value: "/opt/airflow/dags/repo/dags/dependencies/"
# If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
# scheduler and let Kubernetes restart it
livenessProbe:
failureThreshold: 10
periodSeconds: 30
exec:
command:
- python
- -Wignore
- -c
- |
import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys
with create_session() as session:
job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
SchedulerJob.latest_heartbeat.desc()).limit(1).first()
sys.exit(0 if job.is_alive() else 1)
resources:
{}
volumeMounts:
- name: config
mountPath: /opt/airflow/pod_templates/pod_template_file.yaml
subPath: pod_template_file.yaml
readOnly: true
- name: logs
mountPath: "/opt/airflow/logs"
- name: config
mountPath: "/opt/airflow/airflow.cfg"
subPath: airflow.cfg
readOnly: true
- name: dags
mountPath: /opt/airflow/dags
- name: logs-conf
mountPath: "/opt/airflow/config/log_config.py"
subPath: log_config.py
readOnly: true
- name: logs-conf-ini
mountPath: "/opt/airflow/config/__init__.py"
subPath: __init__.py
readOnly: true
- name: git-sync
image: "k8s.gcr.io/git-sync:v3.1.6"
securityContext:
runAsUser: 65533
env:
- name: GIT_SYNC_REV
value: "HEAD"
- name: GIT_SYNC_BRANCH
value: "master"
- name: GIT_SYNC_REPO
value: HIDDEN
- name: GIT_SYNC_DEPTH
value: "1"
- name: GIT_SYNC_ROOT
value: "/git"
- name: GIT_SYNC_DEST
value: "repo"
- name: GIT_SYNC_ADD_USER
value: "true"
- name: GIT_SYNC_WAIT
value: "60"
- name: GIT_SYNC_MAX_SYNC_FAILURES
value: "0"
- name: GIT_SYNC_USERNAME
valueFrom:
secretKeyRef:
name: 'codecommit-key'
key: username
- name: GIT_SYNC_PASSWORD
valueFrom:
secretKeyRef:
name: 'codecommit-key'
key: password
volumeMounts:
- name: dags
mountPath: /git
# Always start the garbage collector sidecar.
- name: scheduler-gc
image: apache/airflow:2.0.0-python3.8
imagePullPolicy: Always
args: ["bash", "/clean-logs"]
volumeMounts:
- name: logs
mountPath: "/opt/airflow/logs"
- name: logs-conf
mountPath: "/opt/airflow/config/log_config.py"
subPath: log_config.py
readOnly: true
- name: logs-conf-ini
mountPath: "/opt/airflow/config/__init__.py"
subPath: __init__.py
readOnly: true
volumes:
- name: config
configMap:
name: airflow-airflow-config
- name: dags
emptyDir: {}
- name: logs
emptyDir: {}
- name: logs-conf
configMap:
name: airflow-airflow-config
- name: logs-conf-ini
configMap:
name: airflow-airflow-config
```
![image](https://user-images.githubusercontent.com/35636329/108077458-71031500-7064-11eb-9858-b47790b82404.png)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-779877565
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-813016533
Interesting can one of you create another GH issue with these details of reproduction using Helm Chart and KinD please @marclamberti @madden1706
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] marclamberti commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
marclamberti commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-809560041
Unfortunately not
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned airflow/airflow-scheduler-78997cd45-6j5mw to airflow-cluster-worker
Normal Pulled 16m kubelet Container image "apache/airflow:2.0.1" already present on machine
Normal Created 16m kubelet Created container wait-for-airflow-migrations
Normal Started 16m kubelet Started container wait-for-airflow-migrations
Normal Pulled 15m kubelet Container image "apache/airflow:2.0.1" already present on machine
Normal Created 15m kubelet Created container scheduler-gc
Normal Started 15m kubelet Started container scheduler-gc
Normal Killing 11m kubelet Container scheduler failed liveness probe, will be restarted
Normal Pulled 11m (x2 over 15m) kubelet Container image "apache/airflow:2.0.1" already present on machine
Normal Started 11m (x2 over 15m) kubelet Started container scheduler
Normal Created 11m (x2 over 15m) kubelet Created container scheduler
Warning Unhealthy 95s (x29 over 15m) kubelet Liveness probe failed:
```
But, after a lot of trials and errors, those 2 lines:
```
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
```
make the livenessprob failing.
Right now I have only this:
```
import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys
sys.exit(0)
```
and still, every 5 mins the scheduler is restarted.
If I remove those lines, it works as sys.exit(0)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-809523328
@marclamberti is there anything useful in the pods events?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202
I managed to fix my restart by setting up the following configs:
```
[kubernetes]
...
delete_option_kwargs = {"grace_period_seconds": 10}
enable_tcp_keepalive = True
tcp_keep_idle = 30
tcp_keep_intvl = 30
tcp_keep_cnt = 30
```
I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version, I realized the problem is with Azure Kubernetes, the rest api calls to the api server.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] careduz edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
careduz edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784574696
We are facing the same issue (scheduler liveness probe always failing and restarting the scheduler). Details:
**Airflow: Version 1.10.14 & 1.10.13**
**Kubernetes: Version 1.20.2** (DigitalOcean)
**Helm airflow-stable/airflow: Version 7.16.0**
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned airflow/airflow-scheduler-75c6c96d68-r9j4m to apollo-kaon3thg1-882c2
Normal Pulled 27m kubelet Container image "alpine/git:latest" already present on machine
Normal Created 27m kubelet Created container git-clone
Normal Started 27m kubelet Started container git-clone
Normal Pulled 26m kubelet Container image "alpine/git:latest" already present on machine
Normal Created 26m kubelet Created container git-sync
Normal Started 26m kubelet Started container git-sync
Normal Killing 12m (x2 over 19m) kubelet Container airflow-scheduler failed liveness probe, will be restarted
Normal Pulled 11m (x3 over 26m) kubelet Container image "apache/airflow:1.10.14-python3.7" already present on machine
Normal Created 11m (x3 over 26m) kubelet Created container airflow-scheduler
Normal Started 11m (x3 over 26m) kubelet Started container airflow-scheduler
Warning Unhealthy 6m (x12 over 21m) kubelet Liveness probe failed:
```
And the logs are basically on a loop:
```
1] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor409-Process, stopped)>
[2021-02-23 22:58:35,578] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:35,578] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:35,581] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:35,581] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:35,582] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:35,582] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:35,582] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:35,587] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:35,587] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:36,589] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
[2021-02-23 22:58:36,729] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6719)
[2021-02-23 22:58:36,930] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6717)
[2021-02-23 22:58:37,258] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor410-Process, stopped)>
[2021-02-23 22:58:37,259] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor411-Process, stopped)>
[2021-02-23 22:58:37,582] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:37,583] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:37,584] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,586] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,588] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,589] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:37,592] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:37,593] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:37,602] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:37,604] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:37,605] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:37,607] {scheduler_job.py:1460} DEBUG - Heartbeating the scheduler
[2021-02-23 22:58:37,620] {base_job.py:197} DEBUG - [heartbeat]
[2021-02-23 22:58:37,630] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.05 seconds
[2021-02-23 22:58:37,631] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:38,165] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6769)
[2021-02-23 22:58:38,268] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6765)
[2021-02-23 22:58:38,276] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor412-Process, started)>
[2021-02-23 22:58:38,284] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor413-Process, stopped)>
[2021-02-23 22:58:38,633] {scheduler_job.py:1484} DEBUG - Sleeping for 0.95 seconds to prevent excessive logging
[2021-02-23 22:58:39,331] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6797)
[2021-02-23 22:58:39,361] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6801)
[2021-02-23 22:58:39,589] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:39,589] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:39,591] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:39,591] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:39,592] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:39,593] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:39,594] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:39,596] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:39,597] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:40,305] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor414-Process, stopped)>
[2021-02-23 22:58:40,306] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor415-Process, stopped)>
[2021-02-23 22:58:40,599] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
[2021-02-23 22:58:41,349] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6829)
[2021-02-23 22:58:41,386] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6831)
[2021-02-23 22:58:41,595] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:41,595] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:41,596] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,597] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,598] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,599] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,600] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:41,601] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:41,602] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:41,602] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:41,604] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:41,604] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:41,607] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:41,608] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
```
EDIT: Tried it on Airflow 1.10.13 and same thing. Updated versions above.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] marclamberti edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
marclamberti edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-808933078
@kaxil I get the same issue in local with KinD. Every 5 mins as described in scheduler-deployment.yaml
```
# If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
# scheduler and let Kubernetes restart it
livenessProbe:
failureThreshold: 10
periodSeconds: 30
exec:
command:
- python
- -Wignore
- -c
- |
import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys
with create_session() as session:
job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
SchedulerJob.latest_heartbeat.desc()).limit(1).first()
sys.exit(0 if job.is_alive() else 1)
```
I ran the script in the scheduler pod and it returns True. However, it still continues to fail. I didn't figure out yet why is that.
Even with the suggestions of @stoiandl it doesn't work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-802040819
At least some of these seem to be caused by #11982.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] marclamberti commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
marclamberti commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-809663845
That's the last logs I get before the scheduler gets killed by Kubernetes
```
021-03-29T19:46:05.9176489Z stdout F [2021-03-29 19:46:05,917] {base_executor.py:161} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
2021-03-29T19:46:05.9177509Z stdout F [2021-03-29 19:46:05,917] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
2021-03-29T19:46:05.9180176Z stdout F [2021-03-29 19:46:05,917] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
2021-03-29T19:46:05.9189036Z stdout F [2021-03-29 19:46:05,918] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
2021-03-29T19:46:05.9190065Z stdout F [2021-03-29 19:46:05,918] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
2021-03-29T19:46:05.9370131Z stdout F [2021-03-29 19:46:05,936] {base_job.py:219} DEBUG - [heartbeat]
2021-03-29T19:46:05.9371407Z stdout F [2021-03-29 19:46:05,936] {scheduler_job.py:1395} DEBUG - Next timed event is in 2.919806
2021-03-29T19:46:05.9372007Z stdout F [2021-03-29 19:46:05,937] {scheduler_job.py:1397} DEBUG - Ran scheduling loop in 0.05 seconds
2021-03-29T19:46:06.5618142Z stdout F [2021-03-29 19:46:06,561] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
2021-03-29T19:46:07.5643446Z stdout F [2021-03-29 19:46:07,563] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 46
2021-03-29T19:46:07.6988128Z stdout F [2021-03-29 19:46:07,698] {process_utils.py:66} INFO - Process psutil.Process(pid=46, status='terminated', exitcode=0, started='19:41:10') (46) terminated with exit code 0
2021-03-29T19:46:07.6993846Z stdout F [2021-03-29 19:46:07,699] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 46
2021-03-29T19:46:07.6994528Z stdout F [2021-03-29 19:46:07,699] {scheduler_job.py:1301} INFO - Exited execute loop
2021-03-29T19:46:07.7051995Z stdout F [2021-03-29 19:46:07,704] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
2021-03-29T19:46:07.7052466Z stdout F [2021-03-29 19:46:07,705] {settings.py:292} DEBUG - Disposing DB connection pool (PID 7)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] marclamberti commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
marclamberti commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-808933078
@kaxil I get the same issue in local with KinD. Every 5 mins as described in scheduler-deployment.yaml
```
# If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
# scheduler and let Kubernetes restart it
livenessProbe:
failureThreshold: 10
periodSeconds: 30
exec:
command:
- python
- -Wignore
- -c
- |
import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys
with create_session() as session:
job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
SchedulerJob.latest_heartbeat.desc()).limit(1).first()
sys.exit(0 if job.is_alive() else 1)
```
I ran the script in the scheduler pod and it returns True. However, it still continues to fail. I didn't figure out yet why is that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] careduz commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
careduz commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784574696
We are facing the same issue (scheduler liveness probe always failing and restarting the scheduler). Details:
**Airflow: Version 1.10.14**
**Kubernetes: Version 1.20.2** (DigitalOcean)
**Helm airflow-stable/airflow: Version 7.16.0**
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned airflow/airflow-scheduler-75c6c96d68-r9j4m to apollo-kaon3thg1-882c2
Normal Pulled 27m kubelet Container image "alpine/git:latest" already present on machine
Normal Created 27m kubelet Created container git-clone
Normal Started 27m kubelet Started container git-clone
Normal Pulled 26m kubelet Container image "alpine/git:latest" already present on machine
Normal Created 26m kubelet Created container git-sync
Normal Started 26m kubelet Started container git-sync
Normal Killing 12m (x2 over 19m) kubelet Container airflow-scheduler failed liveness probe, will be restarted
Normal Pulled 11m (x3 over 26m) kubelet Container image "apache/airflow:1.10.14-python3.7" already present on machine
Normal Created 11m (x3 over 26m) kubelet Created container airflow-scheduler
Normal Started 11m (x3 over 26m) kubelet Started container airflow-scheduler
Warning Unhealthy 6m (x12 over 21m) kubelet Liveness probe failed:
```
And the logs are basically on a loop:
```
1] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor409-Process, stopped)>
[2021-02-23 22:58:35,578] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:35,578] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:35,581] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:35,581] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:35,582] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:35,582] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:35,582] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:35,587] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:35,587] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:36,589] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
[2021-02-23 22:58:36,729] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6719)
[2021-02-23 22:58:36,930] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6717)
[2021-02-23 22:58:37,258] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor410-Process, stopped)>
[2021-02-23 22:58:37,259] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor411-Process, stopped)>
[2021-02-23 22:58:37,582] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:37,583] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:37,584] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,586] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,588] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,589] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:37,592] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:37,593] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:37,602] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:37,604] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:37,605] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:37,607] {scheduler_job.py:1460} DEBUG - Heartbeating the scheduler
[2021-02-23 22:58:37,620] {base_job.py:197} DEBUG - [heartbeat]
[2021-02-23 22:58:37,630] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.05 seconds
[2021-02-23 22:58:37,631] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:38,165] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6769)
[2021-02-23 22:58:38,268] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6765)
[2021-02-23 22:58:38,276] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor412-Process, started)>
[2021-02-23 22:58:38,284] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor413-Process, stopped)>
[2021-02-23 22:58:38,633] {scheduler_job.py:1484} DEBUG - Sleeping for 0.95 seconds to prevent excessive logging
[2021-02-23 22:58:39,331] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6797)
[2021-02-23 22:58:39,361] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6801)
[2021-02-23 22:58:39,589] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:39,589] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:39,591] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:39,591] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:39,592] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:39,593] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:39,594] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:39,596] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:39,597] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:40,305] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor414-Process, stopped)>
[2021-02-23 22:58:40,306] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor415-Process, stopped)>
[2021-02-23 22:58:40,599] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
[2021-02-23 22:58:41,349] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6829)
[2021-02-23 22:58:41,386] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6831)
[2021-02-23 22:58:41,595] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:41,595] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:41,596] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,597] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,598] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,599] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,600] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:41,601] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:41,602] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:41,602] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:41,604] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:41,604] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:41,607] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:41,608] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780603101
It is not a liveness probe issue. For some reason the server stops the heartbeats. I basically have heartbeats every 5 secs in the logs.
Started scheduler pod at 14:26:21.
```
[2021-02-17 14:26:21,095] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f19ad429670>]
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-02-17 14:26:21,615] {executor_loader.py:82} DEBUG - Loading core executor: KubernetesExecutor
[2021-02-17 14:26:21,682] {scheduler_job.py:1241} INFO - Starting the scheduler
[2021-02-17 14:26:21,682] {scheduler_job.py:1246} INFO - Processing each file at most -1 times
[2021-02-17 14:26:21,683] {kubernetes_executor.py:491} INFO - Start Kubernetes executor
[2021-02-17 14:26:21,683] {kubernetes_executor.py:495} DEBUG - Start with scheduler_job_id: 236
[2021-02-17 14:26:21,685] {kubernetes_executor.py:235} DEBUG - Creating Kubernetes executor
[2021-02-17 14:26:21,685] {kubernetes_executor.py:240} DEBUG - Kubernetes using namespace airflow
[2021-02-17 14:26:21,699] {kubernetes_executor.py:458} DEBUG - Clearing tasks that have not been launched
[2021-02-17 14:26:21,704] {kubernetes_executor.py:126} INFO - Event: and now my watch begins starting at resource_version: 0
[2021-02-17 14:26:21,718] {kubernetes_executor.py:462} INFO - When executor started up, found 0 queued task instances
[2021-02-17 14:26:21,724] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 34
[2021-02-17 14:26:21,725] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
```
I can see the heartbeat running every 5 sec until 14:31:22. Below is the last 2 heartbeats before the actual scheduler hangs.
```
[2021-02-17 14:31:17,266] {base_job.py:219} DEBUG - [heartbeat]
[2021-02-17 14:31:17,273] {scheduler_job.py:1390} DEBUG - Next timed event is in 4.999986
[2021-02-17 14:31:17,274] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.08 seconds
[2021-02-17 14:31:17,798] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3408)
[2021-02-17 14:31:17,805] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor481-Process' pid=3408 parent=34 stopped exitcode=0>
[2021-02-17 14:31:18,305] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:31:18,309] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:31:18,309] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:31:18,309] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:31:18,310] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:31:18,310] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:31:18,310] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:31:18,311] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:18,311] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:18,321] {scheduler_job.py:1390} DEBUG - Next timed event is in 3.951973
[2021-02-17 14:31:18,322] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.05 seconds
[2021-02-17 14:31:18,435] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3415)
[2021-02-17 14:31:18,444] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor482-Process' pid=3415 parent=34 stopped exitcode=0>
[2021-02-17 14:31:19,104] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3422)
[2021-02-17 14:31:19,113] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor483-Process' pid=3422 parent=34 stopped exitcode=0>
[2021-02-17 14:31:19,351] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:31:19,354] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:31:19,354] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:31:19,354] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:31:19,355] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:31:19,355] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:31:19,355] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:31:19,356] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:19,356] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:19,367] {scheduler_job.py:1390} DEBUG - Next timed event is in 2.906800
[2021-02-17 14:31:19,367] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.04 seconds
[2021-02-17 14:31:19,702] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3429)
[2021-02-17 14:31:19,710] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor484-Process' pid=3429 parent=34 stopped exitcode=0>
[2021-02-17 14:31:20,340] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3436)
[2021-02-17 14:31:20,348] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor485-Process' pid=3436 parent=34 stopped exitcode=0>
[2021-02-17 14:31:20,408] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:31:20,412] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:31:20,412] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:31:20,413] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:31:20,413] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:31:20,413] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:31:20,413] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:31:20,414] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:20,415] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:20,415] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:20,425] {scheduler_job.py:1390} DEBUG - Next timed event is in 1.848308
[2021-02-17 14:31:20,425] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.05 seconds
[2021-02-17 14:31:20,975] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3443)
[2021-02-17 14:31:20,983] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor486-Process' pid=3443 parent=34 stopped exitcode=0>
[2021-02-17 14:31:21,462] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:31:21,471] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:31:21,471] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:31:21,471] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:31:21,472] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:31:21,472] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:31:21,472] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:31:21,473] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:21,474] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:21,484] {scheduler_job.py:1390} DEBUG - Next timed event is in 0.789546
[2021-02-17 14:31:21,484] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.06 seconds
[2021-02-17 14:31:21,569] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3450)
[2021-02-17 14:31:21,579] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor487-Process' pid=3450 parent=34 stopped exitcode=0>
[2021-02-17 14:31:21,999] {serialized_dag.py:193} DEBUG - Deleting Serialized DAGs (for which DAG files are deleted) from serialized_dag table
[2021-02-17 14:31:22,004] {dag.py:2199} DEBUG - Deactivating DAGs (for which DAG files are deleted) from dag table
[2021-02-17 14:31:22,010] {dagcode.py:135} DEBUG - Deleting code from dag_code table
[2021-02-17 14:31:22,174] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3457)
[2021-02-17 14:31:22,183] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor488-Process' pid=3457 parent=34 stopped exitcode=0>
[2021-02-17 14:31:22,303] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:31:22,307] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:31:22,307] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:31:22,308] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:31:22,308] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:31:22,308] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:31:22,308] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:31:22,309] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:22,310] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:22,310] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:22,339] {base_job.py:219} DEBUG - [heartbeat]
[2021-02-17 14:31:22,348] {scheduler_job.py:1390} DEBUG - Next timed event is in 0.502616
[2021-02-17 14:31:22,348] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.07 seconds
[2021-02-17 14:31:22,868] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3464)
[2021-02-17 14:31:22,880] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor489-Process' pid=3464 parent=34 stopped exitcode=0>
[2021-02-17 14:31:22,886] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:31:22,894] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:31:22,894] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:31:22,894] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:31:22,895] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:31:22,895] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:31:22,896] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:31:22,897] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:31:22,907] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2021-02-17 14:31:23,487] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3471)
[2021-02-17 14:31:23,494] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor490-Process' pid=3471 parent=34 stopped exitcode=0>
[2021-02-17 14:31:24,111] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3478)
[2021-02-17 14:31:24,123] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor491-Process' pid=3478 parent=34 stopped exitcode=0>
[2021-02-17 14:31:24,747] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3485)
[2021-02-17 14:31:24,754] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor492-Process' pid=3485 parent=34 stopped exitcode=0>
[2021-02-17 14:31:25,376] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3492)
[2021-02-17 14:31:25,385] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor493-Process' pid=3492 parent=34 stopped exitcode=0>
[2021-02-17 14:31:26,022] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3499)
[2021-02-17 14:31:26,030] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor494-Process' pid=3499 parent=34 stopped exitcode=0>
[2021-02-17 14:31:26,608] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3506)
[2021-02-17 14:31:26,617] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor495-Process' pid=3506 parent=34 stopped exitcode=0>
[2021-02-17 14:31:27,241] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3513)
[2021-02-17 14:31:27,249] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor496-Process' pid=3513 parent=34 stopped exitcode=0>
[2021-02-17 14:31:27,852] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3520)
[2021-02-17 14:31:27,861] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor497-Process' pid=3520 parent=34 stopped exitcode=0>
[2021-02-17 14:31:28,454] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3527)
[2021-02-17 14:31:28,472] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor498-Process' pid=3527 parent=34 stopped exitcode=0>
[2021-02-17 14:31:29,086] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3534)
... many of these disposing db connections
```
In the frontend I get:
_The scheduler does not appear to be running. Last heartbeat was received 11 minutes ago.
The DAGs list may not update, and new tasks will not be scheduled._
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780665486
After removing the terminated pods, it works without restarts...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-804474746
I am going to close this since https://github.com/apache/airflow/issues/14261#issuecomment-784619202 fixed it for Azure Kubernetes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil closed issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
kaxil closed issue #14261:
URL: https://github.com/apache/airflow/issues/14261
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] madden1706 commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
madden1706 commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-801177247
I am getting this issue on minikube:
> [2021-03-17 14:54:07,522] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-03-17 14:54:07,522] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-03-17 14:54:07,523] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-03-17 14:54:07,523] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-03-17 14:54:07,534] {scheduler_job.py:1390} DEBUG - Next timed event is in 3.959892
[2021-03-17 14:54:07,534] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.04 seconds
[2021-03-17 14:54:08,525] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
[2021-03-17 14:54:09,528] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 42
[2021-03-17 14:54:09,742] {process_utils.py:61} INFO - Process psutil.Process(pid=42, status='terminated', exitcode=0, started='14:49:18') (42) terminated with exit code 0
[2021-03-17 14:54:09,743] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 42
[2021-03-17 14:54:09,743] {scheduler_job.py:1296} INFO - Exited execute loop
[2021-03-17 14:54:09,749] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
[2021-03-17 14:54:09,749] {settings.py:290} DEBUG - Disposing DB connection pool (PID 9)
minikube version: v1.18.1
Restarts seem to be every 5 mins.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202
I managed to fix my restart by setting up the following configs:
```
[kubernetes]
...
kube_client_request_args = { "_request_timeout": 60 }
delete_option_kwargs = {"grace_period_seconds": 10}
enable_tcp_keepalive = True
tcp_keep_idle = 30
tcp_keep_intvl = 30
tcp_keep_cnt = 30
```
I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version,
I realized the problem is with Azure Kubernetes, the rest api calls to the api server. Now, I can see in the scheduler logs, some kubernetes connection errors but those are not killers since airflow tries to reconnect and it is successful the second time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202
I managed to fix my restart by setting up the following configs:
```
[kubernetes]
...
kube_client_request_args = { "_request_timeout": 60 }
delete_option_kwargs = {"grace_period_seconds": 10}
enable_tcp_keepalive = True
tcp_keep_idle = 30
tcp_keep_intvl = 30
tcp_keep_cnt = 30
```
I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version,
I realized the problem is with Azure Kubernetes, the rest api calls to the api server. Now, I can see in the scheduler logs, some kubernetes connection errors but those are not killers since airflow tries to reconnect and it is successful the second time.
Now my logs are full of connection errors:
```
[2021-02-24 00:44:06,188] {kubernetes_executor.py:106} WARNING - There was a timeout error accessing the Kube API. Retrying request.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 313, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1646, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 436, in _error_catcher
yield
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 763, in read_chunked
self._update_chunk_length()
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 693, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 326, in recv_into
raise timeout("The read operation timed out")
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 102, in run
self.resource_version = self._run(
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 145, in _run
for event in list_worker_pods():
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out.
[2021-02-24 00:44:07,330] {kubernetes_executor.py:126} INFO - Event: and now my watch begins starting at resource_version: 0
[2021-02-24 00:45:07,394] {kubernetes_executor.py:106} WARNING - There was a timeout error accessing the Kube API. Retrying request.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 313, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1646, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 436, in _error_catcher
yield
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 763, in read_chunked
self._update_chunk_length()
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 693, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 326, in recv_into
raise timeout("The read operation timed out")
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 102, in run
self.resource_version = self._run(
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 145, in _run
for event in list_worker_pods():
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out.
```
It is not the prettiest solution but it works.... Definitely needs to be addresses somehow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780616603
15 mins later the scheduler comes back up:
```
[2021-02-17 14:47:02,093] {connectionpool.py:751} WARNING - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))': /api/v1/namespaces/airflow/pods?fieldSelector=status.phase%3DSucceeded&labelSelector=kubernetes_executor%3DTrue
```
Returns a bunch of pods config jsons in debug mode. Then:
```
2021-02-17 14:47:03,060] {scheduler_job.py:1390} DEBUG - Next timed event is in 4.999955
[2021-02-17 14:47:03,060] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 940.21 seconds
[2021-02-17 14:47:04,090] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-17 14:47:04,096] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-17 14:47:04,096] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-17 14:47:04,096] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-17 14:47:04,096] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-17 14:47:04,096] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-17 14:47:04,096] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-17 14:47:04,098] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,133] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,134] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor676-Process' pid=4773 parent=34 stopped exitcode=0>
[2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,142] {serialized_dag.py:193} DEBUG - Deleting Serialized DAGs (for which DAG files are deleted) from serialized_dag table
[2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,147] {dag.py:2199} DEBUG - Deactivating DAGs (for which DAG files are deleted) from dag table
[2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
[2021-02-17 14:47:04,152] {dagcode.py:135} DEBUG - Deleting code from dag_code table
[2021-02-17 14:47:04,174] {base_job.py:219} DEBUG - [heartbeat]
```
This might be because I have many terminated worker pods that are not deleted? They come from either a deleted DAG or previous airflow 1.10.13.
I cleared my namespace of pods and now I will rerun with liveness probe up.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-815210940
@marclamberti, @madden1706, I believe the livenessprobe `timeoutSeconds` and kube v1.20 is the culprit here. I've open #15259 to address it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] norfe commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
norfe commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-782901575
I'm seeing a similar issue when trying to run airflow on minikube.
```
$ minikube version
minikube version: v1.17.1
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
```
I can reproduce it always with:
```
$ minikube delete
$ minikube start
$ curl -OL https://github.com/apache/airflow/archive/master.zip
$ unzip master.zip
$ helm dep update airflow-master/chart/
$ helm install airflow ./airflow-master/chart
```
After running for a while, scheduler restarts periodically:
```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
airflow-postgresql-0 1/1 Running 0 17m
airflow-scheduler-5567f545c8-qv7cg 2/2 Running 3 17m
airflow-statsd-5556dc96bc-twbz9 1/1 Running 0 17m
airflow-webserver-65cc966d7c-68wnv 1/1 Running 0 17m
```
```
$ kubectl describe pod airflow-scheduler-5567f545c8-qv7cg
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/airflow-scheduler-5567f545c8-qv7cg to minikube
Normal Pulling 18m kubelet Pulling image "apache/airflow:2.0.0"
Normal Pulled 17m kubelet Successfully pulled image "apache/airflow:2.0.0" in 42.1015082s
Normal Created 17m kubelet Created container wait-for-airflow-migrations
Normal Started 17m kubelet Started container wait-for-airflow-migrations
Normal Started 16m kubelet Started container scheduler-gc
Normal Pulled 16m kubelet Container image "apache/airflow:2.0.0" already present on machine
Normal Created 16m kubelet Created container scheduler-gc
Normal Killing 11m kubelet Container scheduler failed liveness probe, will be restarted
Normal Pulled 11m (x2 over 16m) kubelet Container image "apache/airflow:2.0.0" already present on machine
Normal Started 11m (x2 over 16m) kubelet Started container scheduler
Normal Created 11m (x2 over 16m) kubelet Created container scheduler
Warning Unhealthy 3m20s (x27 over 16m) kubelet Liveness probe failed:
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] careduz edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
careduz edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-785145637
Neat, thanks for the info @stoiandl - will give those settings a shot when we upgrade Airflow (TTL seems to be new for 2.0). Will share our experience afterwards.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780571043
What I am not sure is if I should remove the liveness probe for now. Is it an issue with the scheduler in the end since I do not see any errors in the log?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] lewismc commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
lewismc commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-786154745
We are experiencing this when deploying the chart into a local installations of k3d
```
% kubectl get pods -n airflow-test-local
NAME READY STATUS RESTARTS AGE
airflow-statsd-5556dc96bc-w28cz 1/1 Running 0 7m29s
airflow-postgresql-0 1/1 Running 0 7m29s
airflow-webserver-7d5fbc5675-x6dc7 1/1 Running 0 7m29s
airflow-scheduler-7f59d9c69c-5v9pl 2/3 CrashLoopBackOff 7 7m29s
airflow-cleanup-1614276000-xbcmz 0/1 Completed 0 39s
airflow-scheduler-7f59d9c69c-cvzvx 2/3 CrashLoopBackOff 7 7m29s
```
We also found some interesting `WARNING`'s when looking into the `wait-for-airflow-migrations` container...
```
% kubectl logs airflow-webserver-7d5fbc5675-x6dc7 -c wait-for-airflow-migrations -n airflow-test-local
BACKEND=postgresql
DB_HOST=airflow-postgresql.airflow-test-local.svc.cluster.local
DB_PORT=5432
....
[2021-02-25 17:53:43,435] {migration.py:163} INFO - Context impl PostgresqlImpl.
[2021-02-25 17:53:43,436] {migration.py:170} INFO - Will assume transactional DDL.
[2021-02-25 17:53:49,416] {providers_manager.py:299} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
[2021-02-25 17:53:50,300] {providers_manager.py:299} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
[2021-02-25 17:53:51,345] {<string>:35} INFO - Waiting for migrations... 1 second(s)
[2021-02-25 17:53:52,349] {<string>:35} INFO - Waiting for migrations... 2 second(s)
[2021-02-25 17:53:53,352] {<string>:35} INFO - Waiting for migrations... 3 second(s)
[2021-02-25 17:53:54,355] {<string>:35} INFO - Waiting for migrations... 4 second(s)
[2021-02-25 17:53:55,358] {<string>:35} INFO - Waiting for migrations... 5 second(s)
echiu@MT-308022 chart %
```
I don't think that Azure hooks should be interpreted by default...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-782423867
Few days later the issue appeared again...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] careduz commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
careduz commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-785145637
Neat, thanks for the info @stoiandl - will give those settings a shot when we upgrade to Airflow (TTL seems to be new for 2.0). Will share our experience afterwards.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] madden1706 commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
Posted by GitBox <gi...@apache.org>.
madden1706 commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-810093024
I can confirm the observations that @marclamberti has about the two import lines causing the probe to fail.
```
from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
```
I assumed this was local to me after the issue was closed. Either of these lines causes the liveness probe to fail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org